Skip to content

spotinst/terraform-spotinst-ocean-spark

Repository files navigation

terraform-spotinst-ocean-spark

A Terraform module to install the Ocean for Apache Spark data platform.

Introduction

This module imports an existing Ocean cluster into Ocean Spark.

Pre-Reqs

  • Existing EKS/GKE/AKS Cluster
  • EKS/GKE/AKS cluster integrated with Spot Ocean

Usage

provider "aws" {
  region  = var.aws_region
  profile = var.aws_profile
}

provider "spotinst" {
  token   = var.spotinst_token
  account = var.spotinst_account
}

data "aws_eks_cluster_auth" "this" {
  name = "cluster-name"
}

data "aws_eks_cluster" "this" {
  name = "cluster-name"
}

module "ocean-spark" {
  source = "spotinst/ocean-spark/spotinst"
  version = "~> 3.0.0"

  ocean_cluster_id = var.ocean_cluster_id

  cluster_config = {
    cluster_name               = "cluster-name"
    certificate_authority_data = data.aws_eks_cluster.this.certificate_authority[0].data
    server_endpoint            = data.aws_eks_cluster.this.endpoint
    token                      = data.aws_eks_cluster_auth.this.token
  }
}

Upgrade guides

Examples

This module can be combined with other Terraform modules to support a number of installation methods for Ocean Spark:

  1. Create an Ocean Spark cluster from scratch in your AWS account
  2. Create an Ocean Spark Cluster from scratch in your AWS account with AWS Private Link support
  3. Create an Ocean Spark cluster from scratch in your GCP account
  4. Create an Ocean Spark cluster from scratch in your Azure account
  5. Import an existing EKS cluster into Ocean Spark
  6. Import an existing GKE cluster into Ocean Spark
  7. Import an existing AKS cluster into Ocean Spark
  8. Import an existing Ocean cluster into Ocean Spark

⚠️ Before running terraform destroy ⚠️

If your cluster was created with v1 of the module or you set deployer_namespace = spot-system, follow these steps:

1- Switch your kubectl context to the targeted cluster

2- Run the script scripts/ofas-uninstall.sh job to safely clean the ocean spark components

3- Once the script is completed with success, you can now run terraform destroy

Terraform module documentation

Requirements

Name Version
terraform >= 0.13.1
kubernetes ~> 2.0
spotinst >= 1.115.0, < 2.0.0
validation 1.0.0

Providers

Name Version
null n/a
spotinst >= 1.115.0, < 2.0.0
validation 1.0.0

Modules

No modules.

Resources

Name Type
null_resource.apply_kubernetes_manifest resource
spotinst_ocean_spark.cluster resource
spotinst_ocean_spark_virtual_node_group.this resource
validation_warning.log_collection_collect_driver_logs data source

Inputs

Name Description Type Default Required
attach_dedicated_virtual_node_groups List of virtual node group IDs to attach to the cluster list(string) [] no
cluster_config Configuration for Ocean Kubernetes cluster
object({
cluster_name = string
certificate_authority_data = string
server_endpoint = string
token = optional(string)
client_certificate = optional(string)
client_key = optional(string)
})
n/a yes
compute_create_vngs Controls whether dedicated Ocean Spark VNGs will be created by the cluster creation process bool true no
compute_use_taints Controls whether the Ocean Spark cluster will use taints to schedule workloads bool true no
create_cluster Controls whether the Ocean for Apache Spark cluster should be created (it affects all resources) bool true no
deployer_namespace The namespace Ocean Spark deployer jobs will run in (must be either 'spot-system' or 'kube-system'). The deployer jobs are used to manage Ocean Spark cluster components. string "kube-system" no
enable_custom_endpoint Controls whether the Ocean for Apache Spark control plane address the cluster using a custom endpoint. bool false no
enable_private_link Controls whether the Ocean for Apache Spark control plane address the cluster via an AWS Private Link bool false no
ingress_custom_endpoint_address The address the Ocean for Apache Spark control plane will use when addressing the cluster when custom endpoint is enabled string null no
ingress_load_balancer_service_annotations Annotations that will be added to the load balancer service, allowing for customization of the load balancer map(string) {} no
ingress_load_balancer_target_group_arn The ARN of a target group that the Ocean for Apache Spark ingress controller will be bound to. string null no
ingress_managed_controller Controls whether an ingress controller managed by Ocean for Apache Spark will be installed on the cluster bool true no
ingress_managed_load_balancer Controls whether a load balancer managed by Ocean for Apache Spark will be provisioned for the cluster bool true no
ingress_private_link_endpoint_service_address The name of the VPC Endpoint Service the Ocean for Apache Spark control plane should bind to when privatelink is enabled string null no
log_collection_collect_app_logs Controls whether the Ocean Spark cluster will collect Spark driver/executor logs bool true no
log_collection_collect_driver_logs Controls whether the Ocean Spark cluster will collect Spark driver logs (Deprecated: use log_collection_collect_app_logs instead) bool null no
ocean_cluster_id Specifies the Ocean cluster identifier string n/a yes
spark_additional_app_namespaces List of Kubernetes namespaces that should be configured to run Spark applications, in addition to the default 'spark-apps' namespace list(string) [] no
webhook_host_network_ports Assign a list of ports on the host networks for our system pods list(number) [] no
webhook_use_host_network Controls whether Ocean Spark system pods that expose webhooks will use the host network bool false no

Outputs

Name Description
ocean_spark_id The Ocean Spark cluster Id