Skip to content

snowplow-devops/terraform-azurerm-collector-event-hub-vmss

Repository files navigation

Release CI License Registry Source

terraform-azurerm-collector-event-hub-vmss

A Terraform module which deploys the Snowplow Stream Collector on a VM scale-set and sinks data into Event Hubs over Kafka.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

A Collector requires two output Kafka Topics and a Load Balancer which is deployed upstream. The Load Balancer ensures we can easily configure TLS termination later in the setup and provides a simple mechanism for setting up DNS.

module "pipeline_eh_namespace" {
  source  = "snowplow-devops/event-hub-namespace/azurerm"
  version = "0.1.1"

  name                = "snowplow-pipeline"
  resource_group_name = var.resource_group_name
}

module "raw_eh_topic" {
  source  = "snowplow-devops/event-hub/azurerm"
  version = "0.1.1"

  name                = "raw-topic"
  namespace_name      = module.pipeline_eh_namespace.name
  resource_group_name = var.resource_group_name
}

module "bad_1_eh_topic" {
  source  = "snowplow-devops/event-hub/azurerm"
  version = "0.1.1"

  name                = "bad-1-topic"
  namespace_name      = module.pipeline_eh_namespace.name
  resource_group_name = var.resource_group_name
}

module "collector_lb" {
  source  = "snowplow-devops/lb/azurerm"
  version = "0.2.0"

  name                = "collector-lb"
  resource_group_name = var.resource_group_name
  subnet_id           = var.subnet_id_for_agw

  probe_path = "/health"
}

module "collector_event_hub" {
  source  = "snowplow-devops/collector-event-hub-vmss/azurerm"

  name                = "collector-server"
  resource_group_name = var.resource_group_name
  subnet_id           = var.subnet_id_for_servers

  application_gateway_backend_address_pool_ids = [module.collector_lb.agw_backend_address_pool_id]

  ingress_port = module.collector_lb.agw_backend_egress_port

  good_topic_name           = module.raw_eh_topic.name
  good_topic_kafka_password = module.raw_eh_topic.read_write_primary_connection_string
  bad_topic_name            = module.bad_1_eh_topic.name
  bad_topic_kafka_password  = module.bad_1_eh_topic.read_write_primary_connection_string
  kafka_brokers             = module.pipeline_eh_namespace.broker

  ssh_public_key   = "your-public-key-here"
  ssh_ip_allowlist = ["0.0.0.0/0"]
}

Requirements

Name Version
terraform >= 1.0.0
azurerm >= 3.58.0

Providers

Name Version
azurerm >= 3.58.0

Modules

Name Source Version
service snowplow-devops/service-vmss/azurerm 0.1.1
telemetry snowplow-devops/telemetry/snowplow 0.5.0

Resources

Name Type
azurerm_network_security_group.nsg resource
azurerm_network_security_rule.egress_tcp_443 resource
azurerm_network_security_rule.egress_tcp_80 resource
azurerm_network_security_rule.egress_udp_123 resource
azurerm_network_security_rule.ingress_tcp_22 resource
azurerm_resource_group.rg data source

Inputs

Name Description Type Default Required
bad_topic_kafka_password Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) string n/a yes
bad_topic_name The name of the bad Kafka topic that the collector will insert failed data into string n/a yes
good_topic_kafka_password Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) string n/a yes
good_topic_name The name of the good Kafka topic that the collector will insert good data into string n/a yes
ingress_port The port that the collector will be bound to and expose over HTTP number n/a yes
kafka_brokers The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) string n/a yes
name A name which will be pre-pended to the resources created string n/a yes
resource_group_name The name of the resource group to deploy the service into string n/a yes
ssh_public_key The SSH public key attached for access to the servers string n/a yes
subnet_id The subnet id to deploy the load balancer across string n/a yes
accept_limited_use_license Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) bool false no
app_version App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. string "3.0.1" no
application_gateway_backend_address_pool_ids The ID of an Application Gateway backend address pool to bind the VM scale-set to the load balancer list(string) [] no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
bad_topic_kafka_username Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) string "$ConnectionString" no
byte_limit The amount of bytes to buffer events before pushing them to Kinesis number 1000000 no
cookie_domain Optional first party cookie domain for the collector to set cookies on (e.g. acme.com) string "" no
custom_paths Optional custom paths that the collector will respond to, typical paths to override are '/com.snowplowanalytics.snowplow/tp2', '/com.snowplowanalytics.iglu/v1' and '/r/tp2'. e.g. { "/custom/path/" : "/com.snowplowanalytics.snowplow/tp2"} map(string) {} no
good_topic_kafka_username Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) string "$ConnectionString" no
java_opts Custom JAVA Options string "-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" no
kafka_source The source providing the Kafka connectivity (def: azure_event_hubs) string "azure_event_hubs" no
record_limit The number of events to buffer before pushing them to Kinesis number 500 no
ssh_ip_allowlist The comma-seperated list of CIDR ranges to allow SSH traffic from list(string)
[
"0.0.0.0/0"
]
no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
time_limit_ms The amount of time to buffer events before pushing them to Kinesis number 500 no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no
vm_instance_count The instance type to use number 1 no
vm_sku The instance type to use string "Standard_B1ms" no

Outputs

Name Description
nsg_id ID of the network security group attached to the Collector Server nodes
vmss_id ID of the VM scale-set

Copyright and license

Copyright 2023-present Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages