This open source Terraform module provisions the necessary services to provide a data product on AWS.
- AWS S3
- AWS Athena
- AWS Glue
- AWS Lambda
module my_data_product {
source = "git@github.com:datamesh-architecture/terraform-dataproduct-aws-athena.git"
domain = "<data_product_domain>"
name = "<data_product_name>"
schedule = "0 0 * * ? *" # Run at 00:00 am (UTC) every day
input = [
{
source = "<existing_s3_bucket>"
}
]
transform = {
query = "sql/<name_of_the_transform>.sql"
}
output = {
format = "<format>"
schema = "schema/<name_of_the_schema>.schema.json"
}
}
Additionally, it's necessary to configure credentials for AWS. This can be done in a separate file terraform.tfvars
with the following content:
aws = {
region = "REGION"
access_key = "ACCESS_KEY"
secret_key = "SECRET_KEY"
}
The specified credentials can then be referenced and forwarded in the other *.tf files.
The module creates an RESTful endpoint via AWS lambda (e.g. https://3jopsshxxc.execute-api.eu-central-1.amazonaws.com/prod/). This endpoint can be used as an input for another data product or to retrieve information about this data product.
{
"domain": "<data_product_domain>",
"name": "<data_product_name>",
"output": {
"location": "arn:aws:s3:::<s3_bucket_name>/output/data/"
}
}
See examples repository.
This terraform module is maintained by André Deuerling, Jochen Christ, and Simon Harrer.
MIT License.
Name | Version |
---|---|
aws | >= 4.56 |
Name | Version |
---|---|
archive | n/a |
aws | >= 4.56 |
local | n/a |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
aws | AWS related information and credentials | object({ |
n/a | yes |
domain | The domain of the data product | string |
n/a | yes |
input | List of S3 buckets of other data products which should be used as input | list(object({ |
n/a | yes |
name | The name of the data product | string |
n/a | yes |
output | format: Output format of this data product (e.g. PARQUET) schema: Path to the JSON schema file which describes the output of this data product |
object({ |
n/a | yes |
schedule | The schedule expression to pass to the EventBridge event rule. Format: Minutes | Hours | Day of month | Month | Day of week | Year | string |
"" |
no |
transform | Path to a SQL file, which should be used to transform the input data | object({ |
n/a | yes |
No outputs.