GitHub - anugurthi/devops-checklist: A guide for teams to assess and implement DevOps best practices across Version Control, CI/CD, Terraform, Docker, and Application Security.

🎯 Repository Purpose

Provide teams, organizations, and aspiring DevOps engineers a comprehensive, modern guide on DevOps best practices, tools, and workflows to build, secure, and deploy applications efficiently.

Note: These checklists are opinionated and based on industry experience with modern DevOps practices and DORA principles. They represent common patterns but are not universal truth. You should adapt them to your specific needs and context. Contributions, discussions, and improvements are more than welcome!

🚧 This repository is continuously evolving with DevOps best practices. Contributions and real-world insights are encouraged!

📁 Repository Structure (Essentials)

devops-checklist/
├── README.md            # Main comprehensive checklist (single source of truth)
├── CONTRIBUTING.md      # How to contribute + style guide
├── LICENSE              # Apache 2.0
├── credits.md           # Logo and attribution details
├── .github/             # Issue & PR templates
└── images/              # Header + technology logos

🌐 Core Pillars Covered

Team & Culture, Git, CI/CD Tooling
Docker & Artifact Management
DevSecOps & Supply Chain Security
Infrastructure as Code (Terraform)
Cloud Platform (AWS)
Kubernetes Orchestration & GitOps
Observability (Metrics, Logs, Traces)
Governance & Policy as Code
FinOps & Cloud Cost Optimization

� Key Tooling Comparison

Tool Category	Recommended Tool(s)	Why?
Version Control	Git (Trunk-Based)	Enables continuous, high-frequency delivery
CI/CD Orchestration	GitHub Actions / GitLab CI	Reduced operational overhead; native Git integration
Infrastructure as Code	Terraform	Multi-cloud capability; mature module ecosystem
Policy as Code	OPA Gatekeeper	Declarative governance for Kubernetes & infra
Observability (MLT)	Prometheus + Loki + Tempo	Unified open-source stack for Metrics, Logs, Tracing

📊 DORA Metric Impact Mapping

Checklist Section	Primary DORA Metric Impacted
CI/CD Tooling	Deployment Frequency, Lead Time
Version Control - Git	Lead Time for Changes
DevSecOps	Change Failure Rate (shift-left reduces defects)
Observability	MTTR (faster detection & recovery)
Kubernetes / GitOps	Deployment Frequency, Change Failure Rate

�🧪 DORA & Outcome Focus

Guidance maps to improved deployment frequency, shorter lead times, lower MTTR, and reduced change failure rate.

📋 How to Use This Checklist

This is an aspirational DevOps maturity checklist designed to help teams assess and improve their practices. Think of it as a scorecard for your DevOps journey.

🎯 For Teams

Mark each item based on your current state:

✅ Achieved - Fully implemented and working well
🔄 In Progress - Partially implemented or being worked on
⏳ Not Yet - Not started or planned for future
❌ Not Applicable - Doesn't fit your context

Using the checklist:

Assess: Go through sections relevant to your team and mark current state
Prioritize: Identify high-impact items to work on next (focus on ⚠️ REQUIRED and ⭐ PREFERRED items first)
Track: Revisit quarterly to measure progress
Adapt: Not everything applies to every organization - skip what doesn't make sense for you

📊 Maturity Scoring

Calculate your DevOps maturity score per section:

Score = (Achieved items / Total applicable items) × 100
0-30%: Beginning - Focus on foundations (Git, CI/CD basics, basic monitoring)
31-60%: Developing - Expand capabilities (security scanning, IaC, advanced monitoring)
61-85%: Mature - Optimize and scale (GitOps, service mesh, FinOps, policy as code)
86-100%: Leading - Innovation and continuous improvement

👤 For Individuals

Use this as a learning roadmap and career development guide:

Check off items as you learn and gain hands-on experience
Focus on one section at a time following the 3-month roadmap
Build portfolio projects demonstrating key practices
Track your progress toward DevOps engineer roles

🧭 Choose Your Path

👨‍💼 Team Leads / Managers

Start with Team Section
Go through each technology section
Mark what you already have in place
Identify gaps and prioritize improvements
Create an implementation roadmap

👨‍💻 DevOps Engineers

Review technologies you work with
Check off best practices you're following
Learn from sections where you have gaps
Apply improvements to your workflow
Share knowledge with your team

🎓 Aspiring DevOps Engineers

Start Here: For Aspiring DevOps Engineers
Follow: The 3-month learning roadmap
Build: Portfolio projects from the checklist
Practice: Set up tools locally
Track: Check off skills as you learn

👨‍💻 Developers

Focus on Git, Docker, and CI/CD sections
Understand how your code reaches production
Learn security best practices (SAST/DAST)
Explore AWS basics
Practice with Docker locally

⚡ Quick Wins (First 4 Weeks)

Week 1 – Version Control

Review Git workflows and enable branch protection
Add pull request templates and required reviews
Configure client-side hooks for linting or secrets scanning

Week 2 – CI/CD Foundations

Stand up Jenkins or GitHub Actions
Create an automated build-and-test pipeline
Add notifications to Slack/Teams and start tracking build time

Week 3 – Code Quality & Security

Integrate SonarQube (or an equivalent) into pipelines
Enforce quality gates and remediate critical issues
Layer in SAST/SCA scans and document remediation workflows

Week 4 – Containers & Deployment

Harden Dockerfiles and enable image scanning
Publish images to your chosen registry (ECR/ACR/GCR/Artifactory)
Deploy to AWS ECS, Kubernetes, or a serverless target

📚 Section Reading Guide

Section	What You'll Learn	Time to Read
Team	Roles, skills, culture, goals	10 min
Production & Deployment	Release strategies, change management	10 min
Git	Branching, workflows, security	10 min
CI/CD Tooling	Jenkins JCasC, GitHub Actions, GitLab CI	15 min
SonarQube	Code quality, coverage, quality gates	5 min
Docker	Containers, registries, security	10 min
Artifact Management	Artifactory, Nexus, cloud registries	10 min
DevSecOps	SAST, DAST, SCA, supply chain	15 min
Terraform (IaC)	Remote state, modules, automation	15 min
Cloud Platform (AWS)	IAM, networking, cost	20 min
Kubernetes Orchestration	EKS/GKE/AKS, Helm, GitOps	20 min
Observability (MLT)	Metrics, logs, traces, SLOs	15 min
Governance & Policy as Code	OPA, Sentinel, compliance	10 min
FinOps & Cloud Cost	Tagging, budgets, optimization	10 min

Total reading time: ~3 hours (refer back often!)

🛠️ Environment Setup Options

# Minimal essentials
- Git
- Docker Desktop
- VS Code or preferred editor

# Full local lab
- Git
- Docker Desktop
- Jenkins (local or containerized)
- AWS CLI
- Terraform
- kubectl and Helm (if using Kubernetes)

# Cloud-first approach
- GitHub/GitLab for version control
- GitHub Actions / GitLab CI
- AWS Free Tier or preferred cloud
- Terraform Cloud (free tier)

🚀 First Project: Simple CI/CD Pipeline

Create a sample application in your preferred language
Push to GitHub/GitLab with branch protections enabled
Containerize it with a secure Dockerfile
Build a pipeline (Jenkins, GitHub Actions, or GitLab CI)
Add unit tests, linting, and security scans
Deploy to AWS ECS, Kubernetes, or a serverless target
Capture metrics (build time, deployment duration, failure rate)

� Track Progress & Set Goals

Personal Tracker Template

# My DevOps Journey

## Completed ✅
- [x] Git basics
- [x] Docker fundamentals

## In Progress 🚧
- [ ] Jenkins or GitHub Actions pipelines
- [ ] AWS fundamentals

## Planned 📋
- [ ] Terraform modules
- [ ] Kubernetes & GitOps

Individual Milestones

Build foundational skills (Git, Docker, CI/CD) in the first 3 months
Ship an automation or infrastructure project by month 6
Earn a certification or lead a production improvement by month 9
Track DevOps/SRE readiness milestones every quarter

Team Cadence

Monthly: Review 2-3 checklist sections together
Quarterly: Recompute maturity score and update roadmap
Bi-annually: Revisit architecture, cost, and compliance posture

❓ Common Questions

Do we need everything in this checklist? Focus on what matches your current maturity and business goals.
What order should we learn things? Follow the 3-month roadmap and adapt as you grow.
Is this suitable for beginners? Yes—each section scales from fundamentals to advanced practices.
What if our tooling is different? Apply the principles; substitute equivalent tools (e.g., GitLab CI for GitHub Actions).
How do we measure success? Track DORA metrics, SLO attainment, and cost/incident reductions.

🧭 Usage Patterns

Learning (3-Month Roadmap)

Use the Learning Path (3 Months) checklist to structure your first quarter:

Month 1: Git, Linux, shell scripting, CI/CD fundamentals
Month 2: Pipelines (Jenkins/GitHub Actions), Docker, publish to a registry, AWS basics
Month 3: Terraform, Kubernetes/ECS fundamentals, security scanning, observability basics

Team Maturity Assessment

Review checklist with the team
Mark completed items with ✅
Calculate maturity score (completed items / total items × 100)
Identify high-impact gaps
Create a quarterly improvement roadmap
Reassess each quarter

Technology-Specific Focus Paths

Cloud-Native Teams: Kubernetes → Observability → Governance → FinOps
Security-First Teams: DevSecOps → Governance → Terraform security practices
Cost-Conscious Teams: FinOps → Observability → Right-sizing & automation
Startup / SMB: CI/CD → Docker → AWS basics → Monitoring foundations

📊 Success Metrics (DORA Recap)

Monitor these four keys continuously:

Deployment Frequency – How often you ship production changes
Lead Time for Changes – Time from code commit to production
Mean Time to Recovery (MTTR) – Time to restore service after incidents
Change Failure Rate – % of deployments causing failures/rollbacks

🧪 Modern Cloud-Native Feature Coverage

This checklist intentionally includes expanded coverage beyond traditional DevOps basics:

Kubernetes Orchestration

Multi-cloud clusters (EKS, GKE, AKS)
Helm vs Kustomize usage guidance
RBAC & least-privilege enforcement
Service mesh (Istio, Linkerd, App Mesh) patterns
GitOps (ArgoCD, Flux) deployment workflows

Observability (Metrics, Logs, Traces)

RED/USE metrics patterns, Prometheus exporters
Loki / ELK structured logging and retention
OpenTelemetry instrumentation & tracing (Jaeger/Tempo)
Unified Grafana dashboards and SLO-driven alerting

Governance & Policy as Code

OPA / Gatekeeper admission control policies
HashiCorp Sentinel for Terraform enforcement
Cloud Config/Azure Policy for continuous compliance
Pipeline policy checks (conftest, Checkov, tfsec, Kyverno)

FinOps & Cloud Cost Optimization

Mandatory tagging and cost allocation hygiene
Budget alerts & anomaly detection
Right-sizing + autoscaling strategies
Reserved Instances & Savings Plans adoption tracking
Kubernetes cost management (Kubecost)

Tip: Treat these four domains (Kubernetes, Observability, Governance, FinOps) as additive maturity layers—don’t adopt all at once; layer them after strong CI/CD + security foundations.

Start small: Don't try to implement everything at once
Quick wins first: Tackle items that provide immediate value with low effort
Document decisions: Record why you chose certain tools or skipped certain practices
Share and collaborate: Use this checklist in team discussions and planning sessions

Team	Git	CI/CD	Docker	SonarQube
Security	Terraform	AWS	Kubernetes	Observability
Governance	FinOps

Team 👥
- Responsibilities
- Skills
- DevOps Culture
- Team Goals (DORA & SLOs)
- For Aspiring DevOps Engineers
Production & Deployment
- CI/CD Strategy
- Deployment Models
- Release Management
Version Control - Git
- Repository Structure
- Branching Strategy
- Git Workflow
- Security & Access
CI/CD Tooling
- Jenkins Setup
- Pipeline Best Practices
- Pipeline as Code (Jenkinsfile Example)
- Modern CI/CD Alternatives
- Security & Credentials
- Plugins & Integrations
Code Quality - SonarQube
- Setup & Configuration
- Quality Gates
- Integration with CI/CD
Containerization - Docker
- Image Best Practices
- Dockerfile Guidelines
- Container Security
- Registry Management
Artifact Management
- Nexus Repository
- JFrog Artifactory
- Cloud-Native Registries (ECR/ACR/GCR)
- Artifact Lifecycle
- Security & Access Control
Application Security (DevSecOps)
- SAST - Static Application Security Testing
- SCA - Software Composition Analysis
- DAST - Dynamic Application Security Testing
- Security in CI/CD (Shift-Left)
- Vulnerability Management
Infrastructure as Code - Terraform
- Project Structure
- Best Practices
- State Management
- Modules & Reusability
Cloud Platform - AWS
- Account Structure
- Core Services
- Security & IAM
- Cost Optimization
- Networking
Container Orchestration (ECS & Kubernetes/EKS)
- Orchestrator Decision (ECS vs EKS)
- AWS ECS Task Definitions
- AWS ECS Service Configuration
- Kubernetes/EKS Fundamentals
- Deployment Strategies
Kubernetes Orchestration
- Cluster Management (EKS/GKE/AKS)
- Package Management (Helm & Kustomize)
- RBAC & Security
- Service Mesh (Istio/Linkerd)
- Autoscaling & Resource Management
- GitOps with ArgoCD/Flux
Observability (The MLT Stack)
- Metrics (Prometheus & Grafana)
- Logging (ELK Stack & Loki)
- Tracing (OpenTelemetry & Jaeger)
- Unified Observability Platform
- Alerting & SLOs
Governance & Policy as Code
- Open Policy Agent (OPA)
- HashiCorp Sentinel
- Cloud Compliance (AWS Config/Azure Policy)
- Policy Enforcement in CI/CD
- Audit & Compliance Reporting
FinOps & Cloud Cost Optimization
- Cost Visibility & Tagging Strategy
- Budget Management & Alerts
- Resource Right-Sizing
- Reserved Instances & Savings Plans
- Cost Governance & Accountability
Monitoring & Observability (The Three Pillars)
- Three Pillars of Observability
- Dashboards & Visualization (Grafana/CloudWatch)
- Alerting & Incident Response
Continuous Improvement
Getting Started Guide
Resources
Contributing
Credits
License

Team 👥

Detailed checklist moved to docs/team.md. ➡️ See: Team Checklist

Responsibilities

Skills

Must-Have Skills

Good-to-Have Skills

Programming languages (Go, Python, Java)
Kubernetes/EKS for container orchestration
Advanced networking concepts
Database administration basics
Prometheus and Grafana for monitoring and observability

DevOps Culture

Team Goals (DORA & SLOs)

For Aspiring DevOps Engineers

Welcome to DevOps! 👋 This checklist is your roadmap.

Learning Path (3 Months)

Portfolio Projects

Build these to showcase your skills:

Career Development

Production & Deployment

Moved to a future docs/production.md (to be created if needed). Current overview retained.

CI/CD Strategy

Deployment Models

Choose Deployment Strategy
- Blue-Green Deployment: Two identical environments; switch traffic between them; easy rollback.
- Canary Deployment: Gradual rollout to subset of users; monitor metrics before full rollout.
- Rolling Deployment: Update instances one by one; zero downtime.
- Recreate (Not recommended for production)

Release Management

Version Control - Git

Full best practices moved to docs/git.md. ➡️ See: Git Checklist

Repository Structure

Branching Strategy

Git Workflow

Security & Access

CI/CD Tooling

Full pipeline & tooling guidance moved to docs/cicd.md. ➡️ See: CI/CD Checklist

Modern CI/CD prioritizes automation, security, and maintainability. While Jenkins remains powerful with Configuration as Code (JCasC), cloud-native alternatives like GitHub Actions and GitLab CI offer reduced operational overhead and tighter integration with modern development workflows.

🎯 Key Recommendation: For new projects, prioritize GitHub Actions or GitLab CI for their simplicity and native cloud integration. Use Jenkins with JCasC for complex enterprise environments requiring extensive customization.

Jenkins Setup (Configuration as Code)

Pipeline Best Practices

Pipeline as Code (Jenkinsfile Example)

pipeline {
    agent {
        docker {
            image 'maven:3.8.1-jdk-11'
        }
    }
    
    environment {
        SONAR_TOKEN = credentials('sonar-token')
        DOCKER_REGISTRY = 'your-registry.com'
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
            }
        }
        
        stage('Build') {
            steps {
                sh 'mvn clean compile'
            }
        }
        
        stage('Test') {
            steps {
                sh 'mvn test'
            }
            post {
                always {
                    junit 'target/surefire-reports/*.xml'
                }
            }
        }
        
        stage('Code Quality & SAST') {
            steps {
                sh 'mvn sonar:sonar -Dsonar.token=${SONAR_TOKEN}'
                sh 'trivy fs --security-checks vuln .'
            }
        }
        
        stage('Dependency Scan (SCA)') {
            steps {
                sh 'snyk test --json > snyk_report.json' // Example SCA
            }
        }
        
        stage('Package') {
            steps {
                sh 'mvn package -DskipTests'
            }
        }
        
        stage('Docker Build') {
            steps {
                sh 'docker build -t ${DOCKER_REGISTRY}/myapp:${BUILD_NUMBER} .'
            }
        }
        
        stage('Push to Registry') {
            steps {
                sh 'docker push ${DOCKER_REGISTRY}/myapp:${BUILD_NUMBER}'
            }
        }
        
        stage('Deploy to Dev') {
            steps {
                // Deployment steps
                sh './deploy.sh dev ${BUILD_NUMBER}'
            }
        }
    }
    
    post {
        success {
            echo "Build ${BUILD_NUMBER} succeeded" // Replace with slackSend
        }
        failure {
            echo "Build ${BUILD_NUMBER} failed" // Replace with slackSend
        }
        always {
            cleanWs()
        }
    }
}

Modern CI/CD Alternatives (Recommended for Most Teams)

Example GitHub Actions Workflow:

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up JDK 11
        uses: actions/setup-java@v3
        with:
          java-version: '11'
          distribution: 'temurin'
          cache: maven
      
      - name: Build with Maven
        run: mvn clean compile
      
      - name: Run Tests
        run: mvn test
      
      - name: SonarQube Scan
        uses: sonarsource/sonarqube-scan-action@master
        env:
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
      
      - name: Trivy Security Scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
      
      - name: Build Docker Image
        run: docker build -t myapp:${{ github.sha }} .
      
      - name: Push to ECR
        uses: aws-actions/amazon-ecr-login@v1
        # ... push logic

Security & Credentials

Credentials Management
- Use credentials management systems (Jenkins Credentials, Cloud Secrets Manager, or HashiCorp Vault).
- NO hardcoded secrets in pipeline files.
- Rotate credentials regularly.
Access Control
- Role-based access control (RBAC)
- Audit logs enabled

Plugins & Integrations

Essential Plugins
- Git, Docker, Pipeline, Credentials, SonarQube Scanner
- Slack/Email notifications

Code Quality - SonarQube

Section condensed. (Consider adding docs/code-quality.md later.)

Setup & Configuration

Installation
- SonarQube server installed (Docker is common)
- Database configured (PostgreSQL recommended)
Project Setup
- Projects created for each application
- Quality profiles defined

Quality Gates

Define Quality Gates
- Code coverage threshold (e.g., > 80%)
- Bug and vulnerability limits (zero high/critical)
- Code smell limits
- Duplication percentage limits
Enforcement
- Quality gate as pipeline stage
- Block deployment if quality gate fails
- Notify team of failures

Integration with CI/CD

PR Analysis
- Pull Request decoration enabled (comments on PRs with issues).
- Block merge if quality gate fails.

Containerization - Docker

Detailed Docker practices moved to future docs/docker.md (not yet created).

Image Best Practices

Base Images
- Use official, specific tags, not latest.
- Use minimal base images (Alpine, distroless) for small size and minimal attack surface.
- Keep base images updated.
Image Size
- Multi-stage builds for smaller images.
- Remove unnecessary files.
- Use .dockerignore file.

Dockerfile Guidelines

Container Security

Registry Management

Container Registry
- Choose registry (ECR, Docker Hub, GCR, Nexus).
- Private registry for internal images.
- Access control configured.
Registry Operations
- Automated image builds.
- Image promotion across environments.
- Clean up old/unused images.

Artifact Management

Full multi-tool artifact guidance moved to future docs/artifacts.md (not yet created).

Modern artifact management requires centralized storage, security, and automation. While Nexus remains popular for self-hosted solutions, JFrog Artifactory offers advanced features, and cloud-native registries (ECR/ACR/GCR) provide seamless cloud integration.

🎯 Key Recommendation: Use cloud-native registries (ECR/ACR/GCR) for container images and cloud workloads. For multi-format artifacts (Maven, npm, PyPI, etc.), consider JFrog Artifactory or Nexus based on your feature requirements.

Nexus Repository

JFrog Artifactory

Cloud-Native Registries (ECR/ACR/GCR)

Artifact Lifecycle

Security & Access Control

Application Security (DevSecOps)

Detailed security workflow moved to docs/devsecops.md. ➡️ See: DevSecOps Checklist

SAST - Static Application Security Testing

SAST Implementation
- SAST integrated in CI/CD pipeline.
- Scans run on every commit/PR.
- Results block pipeline if critical issues found.
What SAST Detects
- SQL injection vulnerabilities.
- Cross-site scripting (XSS).
- Hardcoded secrets/credentials.

SCA - Software Composition Analysis

SCA Implementation
- Dependency Scanning integrated in CI/CD pipeline (Snyk, Dependabot, OWASP Dependency-Check).
- Check open-source libraries for known vulnerabilities and license compliance.
- Maintain a software bill of materials (SBOM).

DAST - Dynamic Application Security Testing

DAST Implementation
- DAST runs in staging/pre-prod environment.
- Automated scans after deployment.
- OWASP ZAP (open-source) is a common tool choice.
What DAST Detects
- Authentication/authorization flaws.
- Configuration errors.
- OWASP Top 10 vulnerabilities.

Security in CI/CD (Shift-Left)

1. Code Commit
   ↓
2. SAST Scan (immediate feedback)
   ↓
3. Build & Unit Tests
   ↓
4. Dependency Scan (SCA)
   ↓
5. Container Image Scan (Trivy/Clair)
   ↓
6. Deploy to Staging
   ↓
7. DAST Scan (against running app)
   ↓
8. Deploy to Production

Security Tools Integration
- Secrets Scanning (MANDATORY):
  - GitLeaks or TruffleHog configured to scan full Git history
  - Pre-commit hooks to prevent secret commits
  - CI/CD pipeline blocks on secret detection
  - Regular historical scans (weekly/monthly)
  - Scan all branches, not just main
- License Compliance: Check license compatibility early

Secrets Management & Rotation

🚨 CRITICAL: Secrets scanning must cover the entire Git history, not just new commits. Historical secrets remain exploitable.

Vulnerability Management

Vulnerability Tracking
- Centralized vulnerability dashboard.
- Severity classification (Critical, High, Medium, Low).
- SLA for fixing vulnerabilities defined and enforced.
Security Policies
- No critical/high vulnerabilities allowed in production.
- Regular security audits and penetration testing.

Infrastructure as Code - Terraform

Detailed Terraform practices moved to docs/terraform.md. ➡️ See: Terraform Checklist

Project Structure

Directory Layout
- environments/: Contains environment-specific configurations (dev, staging, production).
- modules/: Contains reusable, well-tested code blocks (e.g., vpc, ecs-cluster).
- backend.tf: Configuration for remote state.

Best Practices

State Management

🚨 CRITICAL FOR TEAM ENVIRONMENTS: Remote backend with state locking is MANDATORY for any team working with Terraform. Local state files are only acceptable for individual learning/experimentation.

Modules & Reusability

Module Design
- Keep modules small and focused.
- Version modules with Git tags.
- Document module inputs/outputs.
CI/CD Integration
- terraform plan on pull requests.
- Plan output commented on PRs (use Infracost for cost estimation).
- terraform apply triggered on merge to main/protected branches.

Cloud Platform - AWS

Detailed AWS baseline moved to docs/aws.md. ➡️ See: AWS Checklist

Account Structure

Multi-Account Strategy
- Separate AWS accounts per environment (Dev, Staging, Prod).
- Use AWS Organizations for governance and consolidated billing.
- Implement Service Control Policies (SCPs).
Account Baseline
- CloudTrail, GuardDuty, and Security Hub enabled in all accounts.
- Enable resource tagging for cost allocation.

Core Services

EC2: Use Auto Scaling Groups, Launch Templates, and IMDSv2.
Lambda: Serverless functions, use environment variables, enable X-Ray tracing.
S3: Enable encryption, block public access by default, use lifecycle policies.
RDS: Use Multi-AZ for production, enable automated backups and encryption.

Security & IAM

IAM
- Enable MFA for all users.
- Use IAM roles, not users, for applications (least privilege principle).
- Rotate access keys regularly.
Secrets Management
- Use AWS Secrets Manager or Parameter Store for runtime secrets.
- Rotate secrets automatically.

Cost Optimization

Cost Management
- Set up billing alerts and use AWS Budgets.
- Use Cost Explorer and Cost Anomaly Detection.
Strategies
- Right-size instances.
- Use Reserved Instances and Savings Plans for steady workloads.
- Use Spot Instances for non-critical, flexible workloads.

Container Orchestration (ECS & Kubernetes/EKS)

Core EKS/ECS comparison retained; deep Kubernetes content centralized separately.

Orchestrator Decision (ECS vs EKS)

When to use ECS (AWS-Native): Simpler needs, lower operational overhead, tighter AWS integration, Fargate preferred for serverless compute.
When to use EKS (Kubernetes): Multi-cloud/hybrid needs, complex orchestration (Service Mesh), necessity of the Kubernetes ecosystem, team has existing K8s expertise.

AWS ECS Task Definitions

Checklist
- Use Fargate (serverless) or EC2 (for control).
- Set appropriate CPU and memory.
- Use Task Roles for AWS API access (least privilege).
- Define health checks.
- Use secrets from Secrets Manager/Parameter Store.
- Configure logging to CloudWatch Logs.

AWS ECS Service Configuration

Setup
- Deploy to multiple Availability Zones.
- Configure Auto Scaling (CPU, Memory, Request Count).
- Set up Load Balancer integration (ALB/NLB).
- Configure deployment circuit breaker.

Kubernetes/EKS Fundamentals

Core Resources
- Understand Pods (smallest deployable unit).
- Understand Deployments (manages desired state of Pods).
- Understand Services (stable internal access/load balancing).
- Understand ConfigMaps and Secrets.
Deployment Tooling
- Use Helm for package management and templating deployments.
- Use K8s-native CD tools like ArgoCD or Flux (GitOps philosophy).

Deployment Strategies

Rolling Update (default): Update instances one by one.
Blue/Green Deployment: Use CodeDeploy (for ECS) or K8s Service Selector switching (for EKS).
Canary Deployment: Deploy a small subset, monitor, and gradually shift traffic.

Kubernetes Orchestration

Full Kubernetes operational checklist moved to docs/kubernetes.md. ➡️ See: Kubernetes Checklist

Modern cloud-native applications require robust container orchestration. Kubernetes has emerged as the de facto standard for managing containerized workloads at scale across clouds.

Cluster Management (EKS/GKE/AKS)

Package Management (Helm & Kustomize)

RBAC & Security

Service Mesh (Istio/Linkerd)

Autoscaling & Resource Management

GitOps with ArgoCD/Flux

Observability (The MLT Stack)

Full MLT guidance moved to docs/observability.md. ➡️ See: Observability Checklist

Modern observability requires the "MLT" (Metrics, Logging, Tracing) approach. These three pillars work together to provide complete visibility into distributed systems.

🎯 Key Recommendation: Implement all three pillars (MLT) for production systems. Use Prometheus + Loki + Tempo for a unified, open-source stack, or leverage cloud-native solutions.

Metrics (Prometheus & Grafana)

Logging (ELK Stack & Loki)

Tracing (OpenTelemetry & Jaeger)

Unified Observability Platform

Alerting & SLOs

Monitoring & Observability (The Three Pillars)

Three Pillars of Observability

Logs: CloudWatch Logs, ELK Stack, Splunk.
Metrics: Prometheus (time series data), CloudWatch Metrics.
Traces: AWS X-Ray or Jaeger (distributed tracing).

Dashboards & Visualization (Grafana/CloudWatch)

Visualization Tools
- Use Grafana to visualize data from Prometheus, CloudWatch, or other sources.
- Centralized dashboards for key application and infrastructure health metrics.
- Real-time monitoring enabled.

Alerting & Incident Response

Alerting Setup
- Define alerts based on SLOs and critical resource thresholds.
- Integrate with notification channels (SNS, Slack, PagerDuty).
- Use Prometheus Alertmanager for sophisticated grouping and routing.
Runbooks
- On-call rotation defined.
- Runbooks for common alerts are documented and accessible.

Governance & Policy as Code

Detailed policy enforcement moved to docs/governance.md. ➡️ See: Governance Checklist

Modern governance requires automated policy enforcement and compliance checks. Policy as Code enables security, compliance, and operational standards to be codified, version-controlled, and automatically enforced.

🎯 Key Recommendation: Implement policy as code early to prevent drift and ensure compliance. Use OPA for Kubernetes and general-purpose policies, Sentinel for Terraform, and cloud-native tools for infrastructure compliance.

Open Policy Agent (OPA)

HashiCorp Sentinel

Cloud Compliance (AWS Config/Azure Policy)

Policy Enforcement in CI/CD

Audit & Compliance Reporting

FinOps & Cloud Cost Optimization

Detailed cost optimization checklist moved to docs/finops.md. ➡️ See: FinOps Checklist

Cloud costs can spiral out of control without proper governance and optimization. FinOps brings financial accountability to cloud spending through visibility, optimization, and cultural change.

🎯 Key Recommendation: Implement comprehensive tagging strategy first, then use native cloud cost tools combined with third-party solutions for deep analysis and recommendations.

Cost Visibility & Tagging Strategy

Budget Management & Alerts

Resource Right-Sizing

Reserved Instances & Savings Plans

Cost Governance & Accountability

Continuous Improvement

Getting Started Guide

For New Teams

Week 1-2: Foundation (Git, AWS Accounts, IAM/Security Baseline).
Week 3-4: CI/CD Foundation (Choose and implement CI Tool, Docker basics, first pipeline).
Week 5-6: Quality & Security (Integrate SonarQube, SAST/SCA, set up Nexus).
Week 7-8: Infrastructure (Terraform basics, provision VPC/Network).
Week 9-10: Container Orchestration (Set up ECS/EKS, configure load balancing).
Week 11-12: Advanced Topics (Implement Observability, Auto-scaling, Blue/Green deployment).

For Individuals Learning DevOps

Month 1: Git + Linux + Bash scripting
Month 2: CI/CD (Jenkins/Actions) + Docker basics
Month 3: AWS fundamentals + Terraform
Month 4: Build end-to-end project
Month 5-6: Advanced topics (K8s/Prometheus) + portfolio projects

Resources

Official Documentation

Learning Platforms

Communities

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

Credits

See credits.md for image and logo attributions.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Remember: DevOps is a journey, not a destination. Start small, automate incrementally, and continuously improve! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
docs		docs
images		images
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
credits.md		credits.md

Folders and files

Latest commit

History

Repository files navigation

🎯 Repository Purpose

📁 Repository Structure (Essentials)

🌐 Core Pillars Covered

� Key Tooling Comparison

📊 DORA Metric Impact Mapping

�🧪 DORA & Outcome Focus

📋 How to Use This Checklist

🎯 For Teams

📊 Maturity Scoring

👤 For Individuals

🧭 Choose Your Path

👨‍💼 Team Leads / Managers

👨‍💻 DevOps Engineers

🎓 Aspiring DevOps Engineers

👨‍💻 Developers

⚡ Quick Wins (First 4 Weeks)

📚 Section Reading Guide

🛠️ Environment Setup Options

🚀 First Project: Simple CI/CD Pipeline

� Track Progress & Set Goals

❓ Common Questions

🧭 Usage Patterns

Learning (3-Month Roadmap)

Team Maturity Assessment

Technology-Specific Focus Paths

📊 Success Metrics (DORA Recap)

🧪 Modern Cloud-Native Feature Coverage

Kubernetes Orchestration

Observability (Metrics, Logs, Traces)

Governance & Policy as Code

FinOps & Cloud Cost Optimization

Table of Contents

Team 👥

Responsibilities

Skills

Must-Have Skills

Good-to-Have Skills

DevOps Culture

Team Goals (DORA & SLOs)

For Aspiring DevOps Engineers

Learning Path (3 Months)

Portfolio Projects

Career Development

Production & Deployment

CI/CD Strategy

Deployment Models

Release Management

Version Control - Git

Repository Structure

Branching Strategy

Git Workflow

Security & Access

CI/CD Tooling

Jenkins Setup (Configuration as Code)

Pipeline Best Practices

Pipeline as Code (Jenkinsfile Example)

Modern CI/CD Alternatives (Recommended for Most Teams)

Security & Credentials

Plugins & Integrations

Code Quality - SonarQube

Setup & Configuration

Quality Gates

Integration with CI/CD

Containerization - Docker

Image Best Practices

Dockerfile Guidelines

Container Security

Registry Management

Artifact Management

Nexus Repository

JFrog Artifactory

Cloud-Native Registries (ECR/ACR/GCR)

Artifact Lifecycle

Security & Access Control

Application Security (DevSecOps)

SAST - Static Application Security Testing