Skip to content

Latest commit

 

History

History
180 lines (145 loc) · 9.5 KB

cluster_policy.md

File metadata and controls

180 lines (145 loc) · 9.5 KB
subcategory
Compute

databricks_cluster_policy Resource

This resource creates a cluster policy, which limits the ability to create clusters based on a set of rules. The policy rules limit the attributes or attribute values available for cluster creation. cluster policies have ACLs that limit their use to specific users and groups. Only admin users can create, edit, and delete policies. Admin users also have access to all policies.

Cluster policies let you:

  • Limit users to create clusters with prescribed settings.
  • Simplify the user interface and enable more users to create their own clusters (by fixing and hiding some values).
  • Control cost by limiting per cluster maximum cost (by setting limits on attributes whose values contribute to hourly price).

Cluster policy permissions limit which policies a user can select in the Policy drop-down when the user creates a cluster:

  • If no policies have been created in the workspace, the Policy drop-down does not display.
  • A user who has cluster create permission can select the Free form policy and create fully-configurable clusters.
  • A user who has both cluster create permission and access to cluster policies can select the Free form policy and policies they have access to.
  • A user that has access to only cluster policies, can select the policies they have access to.

Example Usage

Let us take a look at an example of how you can manage two teams: Marketing and Data Engineering. In the following scenario we want the marketing team to have a really good query experience, so we enabled delta cache for them. On the other hand we want the data engineering team to be able to utilize bigger clusters so we increased the dbus per hour that they can spend. This strategy allows your marketing users and data engineering users to use Databricks in a self service manner but have a different experience in regards to security and performance. And down the line if you need to add more global settings you can propagate them through the "base cluster policy".

modules/base-cluster-policy/main.tf could look like:

variable "team" {
  description = "Team that performs the work"
}

variable "policy_overrides" {
  description = "Cluster policy overrides"
}

locals {
  default_policy = {
    "dbus_per_hour" : {
      "type" : "range",
      "maxValue" : 10
    },
    "autotermination_minutes" : {
      "type" : "fixed",
      "value" : 20,
      "hidden" : true
    },
    "custom_tags.Team" : {
      "type" : "fixed",
      "value" : var.team
    }
  }
}

resource "databricks_cluster_policy" "fair_use" {
  name       = "${var.team} cluster policy"
  definition = jsonencode(merge(local.default_policy, var.policy_overrides))

  libraries {
    pypi {
      package = "databricks-sdk==0.12.0"
      // repo can also be specified here
    }
  }
}

resource "databricks_permissions" "can_use_cluster_policyinstance_profile" {
  cluster_policy_id = databricks_cluster_policy.fair_use.id
  access_control {
    group_name       = var.team
    permission_level = "CAN_USE"
  }
}

And custom instances of that base policy module for our marketing and data engineering teams would look like:

module "marketing_compute_policy" {
  source = "../modules/databricks-cluster-policy"
  team   = "marketing"
  policy_overrides = {
    // only the marketing team will benefit from delta cache this way
    "spark_conf.spark.databricks.io.cache.enabled" : {
      "type" : "fixed",
      "value" : "true"
    },
  }
}

module "engineering_compute_policy" {
  source = "../modules/databricks-cluster-policy"
  team   = "engineering"
  policy_overrides = {
    "dbus_per_hour" : {
      "type" : "range",
      // only the engineering team are allowed to spin up big clusters
      "maxValue" : 50
    },
  }
}

Overriding the built-in cluster policies

You can override built-in cluster policies by creating a databricks_cluster_policy resource with following attributes:

  • name - the name of the built-in cluster policy.
  • policy_family_id - the ID of the cluster policy family used for built-in cluster policy.
  • policy_family_definition_overrides - settings to override in the built-in cluster policy.

You can obtain the list of defined cluster policies families using the databricks policy-families list command of the new Databricks CLI, or via list policy families REST API.

locals {
  personal_vm_override = {
    "autotermination_minutes" : {
      "type" : "fixed",
      "value" : 220,
      "hidden" : true
    },
    "custom_tags.Team" : {
      "type" : "fixed",
      "value" : var.team
    }
  }
}

resource "databricks_cluster_policy" "personal_vm" {
  policy_family_id                   = "personal-vm"
  policy_family_definition_overrides = jsonencode(personal_vm_override)
  name                               = "Personal Compute"
}

Argument Reference

The following arguments are supported:

  • name - (Required) Cluster policy name. This must be unique. Length must be between 1 and 100 characters.
  • description - (Optional) Additional human-readable description of the cluster policy.
  • definition - Policy definition: JSON document expressed in Databricks Policy Definition Language. Cannot be used with policy_family_id
  • max_clusters_per_user - (Optional, integer) Maximum number of clusters allowed per user. When omitted, there is no limit. If specified, value must be greater than zero.
  • policy_family_definition_overrides(Optional) Policy definition JSON document expressed in Databricks Policy Definition Language. The JSON document must be passed as a string and cannot be embedded in the requests. You can use this to customize the policy definition inherited from the policy family. Policy rules specified here are merged into the inherited policy definition.
  • policy_family_id (Optional) ID of the policy family. The cluster policy's policy definition inherits the policy family's policy definition. Cannot be used with definition. Use policy_family_definition_overrides instead to customize the policy definition.
  • libraries (Optional) blocks defining individual libraries that will be installed on the cluster that uses a given cluster policy. See databricks_cluster for more details about supported library types.

Attribute Reference

In addition to all arguments above, the following attributes are exported:

  • id - Canonical unique identifier for the cluster policy. This is equal to policy_id.
  • policy_id - Canonical unique identifier for the cluster policy.

Import

The resource cluster policy can be imported using the policy id:

terraform import databricks_cluster_policy.this <cluster-policy-id>

Related Resources

The following resources are often used in the same context: