This repository is a setup for managing the infrastructure and deployment of chatbot builder system. It uses Terraform for Azure infrastructure, Kubernetes for managing resources and nodes manifests, and GitHub Actions for CI/CD to automate both infrastructure provisioning and application deployment.
This is a high-level overview of the current system design for the chatbot builder infrastructure:
The project is divided into two main sections: Terraform for infrastructure as code and Kubernetes manifests for application deployment.
Terraform code is organized under the infra/
directory. It provisions the Azure Kubernetes Service (AKS) cluster and
related resources.
modules/resource_group
: An Azure resource group.modules/aks
: Azure Kubernetes Service (AKS) cluster. It includes configurations for the number of nodes, VM size, and DNS prefix.modules/public_ip
: Public IP addresses for the AKS cluster, one for the staging environment and another for the production environment.modules/blob_storage
: Azure Blob Storage used by microservices to store files.
- Kubernetes Configuration: The kubeconfig file for authenticating with the AKS cluster.
- Public IPs: The public IPs for the staging and production environments. They are already associated with the AKS cluster and used later to be injected into the Kubernetes manifests (Load Balancers).
-
Azure Credentials: Before running Terraform, generate Azure credentials and save them in Terraform Cloud as environment variables:
- Create a service principal:
az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/<subscription_id>"
- Set the following environment variables in Terraform Cloud:
ARM_CLIENT_ID
ARM_CLIENT_SECRET
ARM_SUBSCRIPTION_ID
ARM_TENANT_ID
- For the Azure Blob Storage Account you need to get the account name and key manually from Azure Portal.
Save them in the
common-secrets.yaml
manifest before applying it to AKS later:AZURE_BLOB_STORAGE_ACCOUNT_NAME
AZURE_BLOB_STORAGE_ACCOUNT_KEY
AZURE_BLOB_STORAGE_CONTAINER_NAME
for which you can use the default containerdefault-container
- The OpenAI Service produces two outputs which should be saved in the
common-secrets.yaml
manifest:OPENAI_ENDPOINT
OPENAI_KEY
you can display this sensitive information using the following command:
terraform output openai_key
- Create a service principal:
-
Terraform Cloud: This project is synced with Terraform Cloud for state management. Ensure you have a Terraform Cloud token saved as a GitHub secret (
TFC_TOKEN
) to authenticate and run Terraform commands in workflows. The token will be used to authenticate with Terraform Cloud and fetch the outputs for Kubernetes configuration and public IPs.
The Kubernetes manifests are located in the manifests/
directory and use Kustomize for managing different
environments (staging and production).
base/
: Contains the foundational Kubernetes manifests that define the core components of the application. These manifests are applied sequentially in the following order:pre-deployment/
: Includes ConfigMaps and PersistentVolumeClaims (PVCs) required before deployments.deployment/
: Includes Deployment, StatefulSets, and DaemonSets.post-deployment/
: Includes Services, Ingress, and Network Policies.
overlays/
: Contains environment-specific customizations for staging and production. Based on the deployment type:- Staging resources are applied in the staging namespace.
- Production resources are applied in the production namespace.
secrets/
: Contains templates for Kubernetes secrets.
Once the Terraform infrastructure is deployed for the first time, the following resources must be applied manually to Azure:
- Namespaces: The
namespaces.yaml
file must be applied to create the required namespaces (staging
andproduction
). - Secrets: Populate the secret templates in the
manifests/secrets/
directory with actual values, then apply them manually.
These resources are essential for the Kubernetes environment to function properly and must be set up before automating deployments.
The repository includes two GitHub Actions workflows:
This workflow runs on any push to the infra/
directory or when triggered manually. It handles the provisioning of
Azure resources.
- Terraform Initialization: Initializes Terraform in the
infra/
directory. - Validation: Validates the Terraform configuration.
- Apply Changes: Applies changes automatically if the branch is
main
. - Output Values: Saves the Terraform outputs (e.g., public IPs, Kubernetes configuration) for use in deployments.
Once Terraform is deployed, it outputs the following critical information:
- Kubernetes Configuration: Used to authenticate with the AKS cluster.
- Public IPs: Staging and production IPs for load balancers.
This information is stored in Terraform Cloud and automatically fetched during the deployment workflow.
This workflow is triggered via a repository dispatch event (deploy_chatbot_staging
or deploy_chatbot_production
). It
deploys the Kubernetes resources to the specified namespace (staging or production).
- Retrieve Terraform Outputs: Fetches the public IPs and Kubernetes configuration using Terraform outputs.
- Update Kubernetes Context: Generates a kubeconfig file from Terraform outputs to authenticate with the AKS cluster.
- Determine Namespace and Overlay: Determines the environment (staging or production) and sets the appropriate overlay.
- Update Resources:
- Replaces the load balancer IP in the service manifests with the IP retrieved from Terraform outputs.
- Updates the container image in the deployment manifests with the image name passed in the dispatch event payload.
- Apply Resources: Applies the Kubernetes resources to azure.