Skip to content

Terraform for basic infrastructure required to run DataRobot on Azure

License

Notifications You must be signed in to change notification settings

datarobot-oss/terraform-azurerm-dr-infra

Repository files navigation

terraform-azurerm-dr-infra

Terraform module to create Azure Cloud infrastructure resources required to run DataRobot.

Usage

module "datarobot_infra" {
  source = "datarobot-oss/dr-infra/azurerm"

  name          = "datarobot"
  domain_name   = "yourdomain.com"
  public_ip_allow_list = ["123.123.123.123/32"]

  create_resource_group         = true
  create_network                = true
  network_address_space         = "10.7.0.0/16"
  create_dns_zones              = false
  existing_public_dns_zone_id   = "/subscriptions/subscription-id/resourceGroups/existing-resource-group-name/providers/Microsoft.Network/dnszones/yourdomain.com"
  create_storage                = true
  create_container_registry     = false
  container_registry_id         = "/subscriptions/subscription-id/resourceGroups/existing-resource-group-name/providers/Microsoft.ContainerRegistry/registries/existing-acr-name"
  create_kubernetes_cluster     = true
  create_app_identity           = true

  ingress_nginx                = true
  internet_facing_ingress_lb   = true
  cert_manager                 = true
  cert_manager_letsencrypt_email_address   = youremail@yourdomain.com
  external_dns                 = true
  nvidia_device_plugin         = true

  tags = {
    application   = "datarobot"
    environment   = "dev"
    managed-by    = "terraform"
  }
}

Examples

Using an example directly from source

  1. Clone the repo
git clone https://github.com/datarobot-oss/terraform-azurerm-dr-infra.git
  1. Change directories into the example that best suits your needs
cd terraform-azurerm-dr-infra/examples/internal
  1. Modify main.tf as needed
  2. Run terraform commands
terraform init
terraform plan
terraform apply
terraform destroy

Requirements

Name Version
terraform >= 1.3.5
azurerm >= 4.3.0
helm >= 2.15.0
kubectl >= 1.14.0

Providers

Name Version
azurerm >= 4.3.0

Modules

Name Source Version
app_identity ./modules/app-identity n/a
cert_manager ./modules/cert-manager n/a
container_registry ./modules/container-registry n/a
dns ./modules/dns n/a
external_dns ./modules/external-dns n/a
ingress_nginx ./modules/ingress-nginx n/a
kubernetes ./modules/kubernetes n/a
naming Azure/naming/azurerm ~> 0.4
network ./modules/network n/a
nvidia_device_plugin ./modules/nvidia-device-plugin n/a
storage ./modules/storage n/a

Resources

Name Type
azurerm_resource_group.this resource
azurerm_subscription.current data source

Inputs

Name Description Type Default Required
cert_manager Install the cert-manager helm chart. All other cert_manager variables are ignored if this variable is false. bool true no
cert_manager_letsencrypt_clusterissuers Whether to create letsencrypt-prod and letsencrypt-staging ClusterIssuers bool true no
cert_manager_letsencrypt_email_address Email address for the certificate owner. Let's Encrypt will use this to contact you about expiring certificates, and issues related to your account. Only required if cert_manager_letsencrypt_clusterissuers is true. string "user@example.com" no
cert_manager_values Path to templatefile containing custom values for the cert-manager helm chart string "" no
cert_manager_variables Variables passed to the cert_manager_values templatefile map(string) {} no
create_app_identity Create a new user assigned identity for the DataRobot application bool true no
create_container_registry Create a new Azure Container Registry. Ignored if an existing existing_container_registry_id is specified. bool true no
create_dns_zones Create DNS zones for domain_name. Ignored if existing_public_dns_zone_id and existing_private_dns_zone_id are specified. bool true no
create_kubernetes_cluster Create a new Azure Kubernetes Service cluster. All kubernetes and helm chart variables are ignored if this variable is false. bool true no
create_network Create a new Azure Virtual Network. Ignored if an existing existing_vnet_id is specified. bool true no
create_resource_group Create a new Azure resource group. Ignored if existing existing_resource_group_name is specified. bool true no
create_storage Create a new Azure Storage account and container. Ignored if an existing_storage_account_id is specified. bool true no
datarobot_namespace Kubernetes namespace in which the DataRobot application will be installed string "dr-app" no
datarobot_service_accounts Names of the Kubernetes service accounts used by the DataRobot application set(string)
[
"dr",
"build-service",
"build-service-image-builder",
"buzok-account",
"dr-lrs-operator",
"dynamic-worker",
"internal-api-sa",
"nbx-notebook-revisions-account",
"prediction-server-sa",
"tileservergl-sa"
]
no
domain_name Name of the domain to use for the DataRobot application. If create_dns_zones is true then zones will be created for this domain. It is also used by the cert-manager helm chart for DNS validation and as a domain filter by the external-dns helm chart. string "" no
existing_container_registry_id ID of existing container registry to use string "" no
existing_kubernetes_nodes_subnet_id ID of an existing subnet to use for the AKS node pools. Required when an existing_network_id is specified. Ignored if create_network is true and no existing_network_id is specified. string "" no
existing_private_dns_zone_id ID of existing private hosted zone to use for private DNS records created by external-dns. This is required when create_dns_zones is false and ingress_nginx is true with internet_facing_ingress_lb false. string "" no
existing_public_dns_zone_id ID of existing public hosted zone to use for public DNS records created by external-dns and public LetsEncrypt certificate validation by cert-manager. This is required when create_dns_zones is false and ingress_nginx and internet_facing_ingress_lb are true or when cert_manager and cert_manager_letsencrypt_clusterissuers are true. string "" no
existing_resource_group_name Name of existing resource group to use string "" no
existing_storage_account_id ID of existing Azure Storage Account to use for DataRobot file storage. When specified, all other storage variables will be ignored. string "" no
existing_vnet_id ID of an existing VNet to use. When specified, other network variables are ignored. string "" no
external_dns Install the external_dns helm chart to create DNS records for ingress resources matching the domain_name variable. All other external_dns variables are ignored if this variable is false. bool true no
external_dns_values Path to templatefile containing custom values for the external-dns helm chart string "" no
external_dns_variables Variables passed to the external_dns_values templatefile map(string) {} no
ingress_nginx Install the ingress-nginx helm chart to use as the ingress controller for the AKS cluster. All other ingress_nginx variables are ignored if this variable is false. bool true no
ingress_nginx_values Path to templatefile containing custom values for the ingress-nginx helm chart string "" no
ingress_nginx_variables Variables passed to the ingress_nginx_values templatefile map(string) {} no
internet_facing_ingress_lb Determines the type of Standard Load Balancer created for AKS ingress. If true, a public Standard Load Balancer will be created. If false, an internal Standard Load Balancer will be created. bool true no
kubernetes_cluster_endpoint_public_access Whether or not the Kubernetes API endpoint should be exposed to the public internet. When false, the cluster endpoint is only available internally to the virtual network. bool true no
kubernetes_cluster_version AKS cluster version string null no
kubernetes_dns_service_ip IP address within the Kubernetes service address range that will be used by cluster service discovery (kube-dns) string null no
kubernetes_gpu_nodepool_labels A map of Kubernetes labels to apply to the GPU node pool map(string)
{
"datarobot.com/node-capability": "gpu"
}
no
kubernetes_gpu_nodepool_max_count Maximum number of nodes in the GPU node pool number 10 no
kubernetes_gpu_nodepool_min_count Minimum number of nodes in the GPU node pool number 0 no
kubernetes_gpu_nodepool_name Name of the GPU node pool string "gpu" no
kubernetes_gpu_nodepool_node_count Node count of the GPU node pool number 0 no
kubernetes_gpu_nodepool_taints A list of Kubernetes taints to apply to the GPU node pool list(string)
[
"nvidia.com/gpu:NoSchedule"
]
no
kubernetes_gpu_nodepool_vm_size VM size used for the GPU node pool string "Standard_NC4as_T4_v3" no
kubernetes_nodepool_availability_zones Availability zones to use for the AKS node pools set(string)
[
"1",
"2",
"3"
]
no
kubernetes_pod_cidr The CIDR to use for Kubernetes pod IP addresses string null no
kubernetes_primary_nodepool_labels A map of Kubernetes labels to apply to the primary node pool map(string) {} no
kubernetes_primary_nodepool_max_count Maximum number of nodes in the primary node pool number 10 no
kubernetes_primary_nodepool_min_count Minimum number of nodes in the primary node pool number 3 no
kubernetes_primary_nodepool_name Name of the primary node pool string "primary" no
kubernetes_primary_nodepool_node_count Node count of the primary node pool number 6 no
kubernetes_primary_nodepool_taints A list of Kubernetes taints to apply to the primary node pool list(string) [] no
kubernetes_primary_nodepool_vm_size VM size used for the primary node pool string "Standard_D32s_v4" no
kubernetes_service_cidr The CIDR to use for Kubernetes service IP addresses string null no
location Azure location to create resources in string n/a yes
name Name to use as a prefix for created resources string n/a yes
network_address_space CIDR block to be used for the new VNet. By default, AKS uses 10.0.0.0/16 for services and 10.244.0.0/16 for pods. This should not overlap with the aks_service_cidr or aks_pod_cidr variables. string "10.1.0.0/16" no
nvidia_device_plugin Install the nvidia-device-plugin helm chart to expose node GPU resources to the EKS cluster. All other nvidia_device_plugin variables are ignored if this variable is false. bool true no
nvidia_device_plugin_values Path to templatefile containing custom values for the nvidia-device-plugin helm chart string "" no
nvidia_device_plugin_variables Variables passed to the nvidia_device_plugin_values templatefile map(string) {} no
public_ip_allow_list List of public IP ranges in CIDR format to allow Kubernetes cluster API endpoint, storage account, and container registry access. By default the storage account and container registry can only be accessed within the VNet via PrivateLink. When kubernetes_cluster_endpoint_public_access are true, this list restricts access to only the specified IP addresses. When kubernetes_cluster_endpoint_public_access is false, this list is ignored and the Kubernetes cluster API endpoint can only be reached from within the VNet. Required when creating a storage account, container registry, or kubernetes cluster over the public internet. list(string) [] no
storage_account_replication_type Storage account data replication type as described in https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy string "ZRS" no
tags A map of tags to add to all created resources map(string)
{
"managed-by": "terraform"
}
no

Outputs

Name Description
aks_cluster_id ID of the Azure Kubernetes Service cluster
container_registry_admin_password Admin password of the container registry
container_registry_admin_username Admin username of the container registry
container_registry_id ID of the container registry
container_registry_login_server The URL that can be used to log into the container registry
private_zone_id ID of the private zone
public_zone_id ID of the public zone
resource_group_id The ID of the Resource Group
storage_access_key The primary access key for the storage account
storage_account_name Name of the storage account
storage_container_name Name of the storage container
user_assigned_identity_client_id Client ID of the user assigned identity
user_assigned_identity_id ID of the user assigned identity
user_assigned_identity_name Name of the user assigned identity
user_assigned_identity_principal_id Principal ID of the user assigned identity
user_assigned_identity_tenant_id Tenant ID of the user assigned identity
vnet_id The ID of the VNet

About

Terraform for basic infrastructure required to run DataRobot on Azure

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages