A Terraform module which deploys the Snowplow Stream Collector on a VM scale-set and sinks data into Event Hubs over Kafka.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
A Collector requires two output Kafka Topics and a Load Balancer which is deployed upstream. The Load Balancer ensures we can easily configure TLS termination later in the setup and provides a simple mechanism for setting up DNS.
module "pipeline_eh_namespace" {
source = "snowplow-devops/event-hub-namespace/azurerm"
version = "0.1.1"
name = "snowplow-pipeline"
resource_group_name = var.resource_group_name
}
module "raw_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "raw-topic"
namespace_name = module.pipeline_eh_namespace.name
resource_group_name = var.resource_group_name
}
module "bad_1_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "bad-1-topic"
namespace_name = module.pipeline_eh_namespace.name
resource_group_name = var.resource_group_name
}
module "collector_lb" {
source = "snowplow-devops/lb/azurerm"
version = "0.2.0"
name = "collector-lb"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_agw
probe_path = "/health"
}
module "collector_event_hub" {
source = "snowplow-devops/collector-event-hub-vmss/azurerm"
name = "collector-server"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_servers
application_gateway_backend_address_pool_ids = [module.collector_lb.agw_backend_address_pool_id]
ingress_port = module.collector_lb.agw_backend_egress_port
good_topic_name = module.raw_eh_topic.name
good_topic_kafka_password = module.raw_eh_topic.read_write_primary_connection_string
bad_topic_name = module.bad_1_eh_topic.name
bad_topic_kafka_password = module.bad_1_eh_topic.read_write_primary_connection_string
kafka_brokers = module.pipeline_eh_namespace.broker
ssh_public_key = "your-public-key-here"
ssh_ip_allowlist = ["0.0.0.0/0"]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
azurerm | >= 3.58.0 |
Name | Version |
---|---|
azurerm | >= 3.58.0 |
Name | Source | Version |
---|---|---|
service | snowplow-devops/service-vmss/azurerm | 0.1.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
azurerm_network_security_group.nsg | resource |
azurerm_network_security_rule.egress_tcp_443 | resource |
azurerm_network_security_rule.egress_tcp_80 | resource |
azurerm_network_security_rule.egress_udp_123 | resource |
azurerm_network_security_rule.ingress_tcp_22 | resource |
azurerm_resource_group.rg | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
bad_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) | string |
n/a | yes |
bad_topic_name | The name of the bad Kafka topic that the collector will insert failed data into | string |
n/a | yes |
good_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) | string |
n/a | yes |
good_topic_name | The name of the good Kafka topic that the collector will insert good data into | string |
n/a | yes |
ingress_port | The port that the collector will be bound to and expose over HTTP | number |
n/a | yes |
kafka_brokers | The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
resource_group_name | The name of the resource group to deploy the service into | string |
n/a | yes |
ssh_public_key | The SSH public key attached for access to the servers | string |
n/a | yes |
subnet_id | The subnet id to deploy the load balancer across | string |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
app_version | App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"3.0.1" |
no |
application_gateway_backend_address_pool_ids | The ID of an Application Gateway backend address pool to bind the VM scale-set to the load balancer | list(string) |
[] |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
bad_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
byte_limit | The amount of bytes to buffer events before pushing them to Kinesis | number |
1000000 |
no |
cookie_domain | Optional first party cookie domain for the collector to set cookies on (e.g. acme.com) | string |
"" |
no |
custom_paths | Optional custom paths that the collector will respond to, typical paths to override are '/com.snowplowanalytics.snowplow/tp2', '/com.snowplowanalytics.iglu/v1' and '/r/tp2'. e.g. { "/custom/path/" : "/com.snowplowanalytics.snowplow/tp2"} | map(string) |
{} |
no |
good_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
kafka_source | The source providing the Kafka connectivity (def: azure_event_hubs) | string |
"azure_event_hubs" |
no |
record_limit | The number of events to buffer before pushing them to Kinesis | number |
500 |
no |
ssh_ip_allowlist | The comma-seperated list of CIDR ranges to allow SSH traffic from | list(string) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
time_limit_ms | The amount of time to buffer events before pushing them to Kinesis | number |
500 |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
vm_instance_count | The instance type to use | number |
1 |
no |
vm_sku | The instance type to use | string |
"Standard_B1ms" |
no |
Name | Description |
---|---|
nsg_id | ID of the network security group attached to the Collector Server nodes |
vmss_id | ID of the VM scale-set |
Copyright 2023-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)