|
| 1 | +--- |
| 2 | +title: Querying S3 Tables with Snowflake |
| 3 | +description: In this tutorial, you will learn how to integrate AWS S3 Tables with Snowflake to query Iceberg tables stored in S3 Tables buckets through LocalStack. |
| 4 | +template: doc |
| 5 | +nav: |
| 6 | +label: |
| 7 | +--- |
| 8 | + |
| 9 | +## Introduction |
| 10 | + |
| 11 | +In this tutorial, you will explore how to connect Snowflake to AWS S3 Tables locally using LocalStack. S3 Tables is a managed Apache Iceberg table catalog that uses S3 storage, providing built-in maintenance features like automatic compaction and snapshot management. |
| 12 | + |
| 13 | +With LocalStack's Snowflake emulator, you can create catalog integrations that connect to S3 Tables and query Iceberg tables without needing cloud resources. This integration allows you to: |
| 14 | + |
| 15 | +- Create catalog integrations to connect Snowflake to S3 Tables. |
| 16 | +- Query existing Iceberg tables stored in S3 Tables buckets. |
| 17 | +- Leverage automatic schema inference from external Iceberg tables. |
| 18 | + |
| 19 | +## Prerequisites |
| 20 | + |
| 21 | +- [`localstack` CLI](/snowflake/getting-started/) with a [`LOCALSTACK_AUTH_TOKEN`](/aws/getting-started/auth-token/) |
| 22 | +- [LocalStack for Snowflake](/snowflake/getting-started/) |
| 23 | +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) & [`awslocal` wrapper](/aws/integrations/aws-native-tools/aws-cli/#localstack-aws-cli-awslocal) |
| 24 | +- Python 3.10+ with `pyiceberg` and `pyarrow` installed |
| 25 | + |
| 26 | +## Start LocalStack |
| 27 | + |
| 28 | +Start your LocalStack container with the Snowflake emulator enabled. |
| 29 | + |
| 30 | +```bash |
| 31 | +export LOCALSTACK_AUTH_TOKEN=<your_auth_token> |
| 32 | +localstack start --stack snowflake |
| 33 | +``` |
| 34 | + |
| 35 | +## Create S3 Tables resources |
| 36 | + |
| 37 | +Before configuring Snowflake, you need to create S3 Tables resources using the AWS CLI. This includes a table bucket and a namespace. |
| 38 | + |
| 39 | +### Create a table bucket |
| 40 | + |
| 41 | +Create a table bucket to store your Iceberg tables. |
| 42 | + |
| 43 | +```bash |
| 44 | +awslocal s3tables create-table-bucket --name my-table-bucket |
| 45 | +``` |
| 46 | + |
| 47 | +```bash title="Output" |
| 48 | +{ |
| 49 | + "arn": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket" |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +### Create a namespace |
| 54 | + |
| 55 | +Create a namespace within the table bucket to organize your tables. |
| 56 | + |
| 57 | +```bash |
| 58 | +awslocal s3tables create-namespace \ |
| 59 | + --table-bucket-arn arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket \ |
| 60 | + --namespace my_namespace |
| 61 | +``` |
| 62 | + |
| 63 | +```bash title="Output" |
| 64 | +{ |
| 65 | + "tableBucketARN": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket", |
| 66 | + "namespace": [ |
| 67 | + "my_namespace" |
| 68 | + ] |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +## Create and populate a table in S3 Tables |
| 73 | + |
| 74 | +To query data from Snowflake using `CATALOG_TABLE_NAME`, the S3 Tables table must have a defined schema and contain data. Use PyIceberg to create a table with schema and populate it with data. |
| 75 | + |
| 76 | +First, install the required Python packages: |
| 77 | + |
| 78 | +```bash |
| 79 | +pip install "pyiceberg[s3fs,pyarrow]" boto3 |
| 80 | +``` |
| 81 | + |
| 82 | +Create a Python script named `setup_s3_tables.py` with the following content: |
| 83 | + |
| 84 | +```python |
| 85 | +import pyarrow as pa |
| 86 | +from pyiceberg.catalog.rest import RestCatalog |
| 87 | +from pyiceberg.schema import Schema |
| 88 | +from pyiceberg.types import NestedField, StringType, LongType |
| 89 | + |
| 90 | +# Configuration |
| 91 | +LOCALSTACK_URL = "http://localhost.localstack.cloud:4566" |
| 92 | +S3TABLES_URL = "http://s3tables.localhost.localstack.cloud:4566" |
| 93 | +TABLE_BUCKET_NAME = "my-table-bucket" |
| 94 | +NAMESPACE = "my_namespace" |
| 95 | +TABLE_NAME = "customer_orders" |
| 96 | +REGION = "us-east-1" |
| 97 | + |
| 98 | +# Create PyIceberg REST catalog pointing to S3 Tables |
| 99 | +catalog = RestCatalog( |
| 100 | + name="s3tables_catalog", |
| 101 | + uri=f"{S3TABLES_URL}/iceberg", |
| 102 | + warehouse=TABLE_BUCKET_NAME, |
| 103 | + **{ |
| 104 | + "s3.region": REGION, |
| 105 | + "s3.endpoint": LOCALSTACK_URL, |
| 106 | + "client.access-key-id": "000000000000", |
| 107 | + "client.secret-access-key": "test", |
| 108 | + "rest.sigv4-enabled": "true", |
| 109 | + "rest.signing-name": "s3tables", |
| 110 | + "rest.signing-region": REGION, |
| 111 | + }, |
| 112 | +) |
| 113 | + |
| 114 | +# Define table schema |
| 115 | +schema = Schema( |
| 116 | + NestedField(field_id=1, name="order_id", field_type=StringType(), required=False), |
| 117 | + NestedField(field_id=2, name="customer_name", field_type=StringType(), required=False), |
| 118 | + NestedField(field_id=3, name="amount", field_type=LongType(), required=False), |
| 119 | +) |
| 120 | + |
| 121 | +# Create table in S3 Tables |
| 122 | +catalog.create_table( |
| 123 | + identifier=(NAMESPACE, TABLE_NAME), |
| 124 | + schema=schema, |
| 125 | +) |
| 126 | + |
| 127 | +print(f"Created table: {NAMESPACE}.{TABLE_NAME}") |
| 128 | + |
| 129 | +# Reload the table to get the latest metadata |
| 130 | +table = catalog.load_table((NAMESPACE, TABLE_NAME)) |
| 131 | + |
| 132 | +# Populate table with sample data |
| 133 | +data = pa.table({ |
| 134 | + "order_id": ["ORD001", "ORD002", "ORD003"], |
| 135 | + "customer_name": ["Alice", "Bob", "Charlie"], |
| 136 | + "amount": [100, 250, 175], |
| 137 | +}) |
| 138 | + |
| 139 | +table.append(data) |
| 140 | +print("Inserted sample data into table") |
| 141 | + |
| 142 | +# Verify table exists |
| 143 | +tables = catalog.list_tables(NAMESPACE) |
| 144 | +print(f"Tables in namespace: {tables}") |
| 145 | +``` |
| 146 | + |
| 147 | +Run the script to create the table and populate it with data: |
| 148 | + |
| 149 | +```bash |
| 150 | +python setup_s3_tables.py |
| 151 | +``` |
| 152 | + |
| 153 | +```bash title="Output" |
| 154 | +Created table: my_namespace.customer_orders |
| 155 | +Inserted sample data into table |
| 156 | +Tables in namespace: [('my_namespace', 'customer_orders')] |
| 157 | +``` |
| 158 | + |
| 159 | +## Connect to the Snowflake emulator |
| 160 | + |
| 161 | +Connect to the locally running Snowflake emulator using an SQL client of your choice (such as DBeaver). The Snowflake emulator runs on `snowflake.localhost.localstack.cloud`. |
| 162 | + |
| 163 | +You can use the following connection parameters: |
| 164 | + |
| 165 | +| Parameter | Value | |
| 166 | +|-----------|-------| |
| 167 | +| Host | `snowflake.localhost.localstack.cloud` | |
| 168 | +| User | `test` | |
| 169 | +| Password | `test` | |
| 170 | +| Account | `test` | |
| 171 | +| Warehouse | `test` | |
| 172 | + |
| 173 | +## Create a catalog integration |
| 174 | + |
| 175 | +Create a catalog integration to connect Snowflake to your S3 Tables bucket. The catalog integration defines how Snowflake connects to the external Iceberg REST catalog provided by S3 Tables. |
| 176 | + |
| 177 | +```sql |
| 178 | +CREATE OR REPLACE CATALOG INTEGRATION s3tables_catalog_integration |
| 179 | + CATALOG_SOURCE=ICEBERG_REST |
| 180 | + TABLE_FORMAT=ICEBERG |
| 181 | + CATALOG_NAMESPACE='my_namespace' |
| 182 | + REST_CONFIG=( |
| 183 | + CATALOG_URI='http://s3tables.localhost.localstack.cloud:4566/iceberg' |
| 184 | + CATALOG_NAME='my-table-bucket' |
| 185 | + ) |
| 186 | + REST_AUTHENTICATION=( |
| 187 | + TYPE=AWS_SIGV4 |
| 188 | + AWS_ACCESS_KEY_ID='000000000000' |
| 189 | + AWS_SECRET_ACCESS_KEY='test' |
| 190 | + AWS_REGION='us-east-1' |
| 191 | + AWS_SERVICE='s3tables' |
| 192 | + ) |
| 193 | + ENABLED=TRUE |
| 194 | + REFRESH_INTERVAL_SECONDS=60; |
| 195 | +``` |
| 196 | + |
| 197 | +In the above query: |
| 198 | + |
| 199 | +- `CATALOG_SOURCE=ICEBERG_REST` specifies that the catalog uses the Iceberg REST protocol. |
| 200 | +- `TABLE_FORMAT=ICEBERG` indicates the table format. |
| 201 | +- `CATALOG_NAMESPACE='my_namespace'` sets the default namespace to query tables from. |
| 202 | +- `REST_CONFIG` configures the connection to the LocalStack S3 Tables REST API endpoint. |
| 203 | +- `REST_AUTHENTICATION` configures AWS SigV4 authentication for the S3 Tables service. |
| 204 | +- `REFRESH_INTERVAL_SECONDS=60` sets how often Snowflake refreshes metadata from the catalog. |
| 205 | + |
| 206 | +## Create an Iceberg table referencing S3 Tables |
| 207 | + |
| 208 | +Create an Iceberg table in Snowflake that references the existing S3 Tables table using `CATALOG_TABLE_NAME`. The schema is automatically inferred from the external table. |
| 209 | + |
| 210 | +```sql |
| 211 | +CREATE OR REPLACE ICEBERG TABLE iceberg_customer_orders |
| 212 | + CATALOG='s3tables_catalog_integration' |
| 213 | + CATALOG_TABLE_NAME='my_namespace.customer_orders' |
| 214 | + AUTO_REFRESH=TRUE; |
| 215 | +``` |
| 216 | + |
| 217 | +In the above query: |
| 218 | + |
| 219 | +- `CATALOG` references the catalog integration created in the previous step. |
| 220 | +- `CATALOG_TABLE_NAME` specifies the fully-qualified table name in the format `namespace.table_name`. |
| 221 | +- `AUTO_REFRESH=TRUE` enables automatic refresh of table metadata. |
| 222 | +- No column definitions are needed as the schema is inferred from the existing S3 Tables table. |
| 223 | + |
| 224 | +## Query the Iceberg table |
| 225 | + |
| 226 | +You can now query the Iceberg table like any other Snowflake table. The schema (columns) are automatically available from the external table. |
| 227 | + |
| 228 | +```sql |
| 229 | +SELECT * FROM iceberg_customer_orders; |
| 230 | +``` |
| 231 | + |
| 232 | +```sql title="Output" |
| 233 | ++----------+---------------+--------+ |
| 234 | +| order_id | customer_name | amount | |
| 235 | ++----------+---------------+--------+ |
| 236 | +| ORD001 | Alice | 100 | |
| 237 | +| ORD002 | Bob | 250 | |
| 238 | +| ORD003 | Charlie | 175 | |
| 239 | ++----------+---------------+--------+ |
| 240 | +``` |
| 241 | + |
| 242 | +## Conclusion |
| 243 | + |
| 244 | +In this tutorial, you learned how to integrate AWS S3 Tables with Snowflake using LocalStack. You created S3 Tables resources, populated a table with data using PyIceberg, configured a catalog integration in Snowflake, and queried Iceberg tables stored in S3 Tables buckets using `CATALOG_TABLE_NAME`. |
| 245 | + |
| 246 | +The S3 Tables integration enables you to: |
| 247 | + |
| 248 | +- Query data stored in S3 Tables using familiar Snowflake SQL syntax. |
| 249 | +- Leverage automatic schema inference from external Iceberg catalogs. |
| 250 | +- Develop and test your data lakehouse integrations locally without cloud resources. |
| 251 | + |
| 252 | +LocalStack's Snowflake emulator combined with S3 Tables support provides a complete local environment for developing and testing multi-platform data analytics workflows. |
0 commit comments