diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 8f8b7c1..e218c39 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -4,12 +4,7 @@ ** xref:use-cases-architectures:change-data-capture/table-schema-evolution.adoc[] ** xref:use-cases-architectures:change-data-capture/consuming-change-data.adoc[] ** xref:use-cases-architectures:change-data-capture/questions-and-patterns.adoc[] -* Data pipelines -** xref:use-cases-architectures:real-time-data-pipeline/index.adoc[] -** xref:use-cases-architectures:real-time-data-pipeline/01-create-astra-objects.adoc[] -** xref:use-cases-architectures:real-time-data-pipeline/02-create-decodable-objects.adoc[] -** xref:use-cases-architectures:real-time-data-pipeline/03-put-it-all-together.adoc[] -** xref:use-cases-architectures:real-time-data-pipeline/04-debugging-and-clean-up.adoc[] +* xref:astra-streaming:getting-started:real-time-data-pipelines-tutorial.adoc[Build real-time data pipelines with {product}] .Migrating to {pulsar} * xref:use-cases-architectures:starlight/index.adoc[] diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc index 98906b7..8806857 100644 --- a/modules/ROOT/pages/index.adoc +++ b/modules/ROOT/pages/index.adoc @@ -19,7 +19,7 @@ We've included best practices for Apache Pulsar, a full connector reference, and examples for getting the most out of Astra's CDC feature. - xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[Build a real-time data pipeline] + xref:astra-streaming:getting-started:real-time-data-pipelines-tutorial.adoc[Build real-time data pipelines with {product}] diff --git a/modules/use-cases-architectures/attachments/web-clicks-website.zip b/modules/use-cases-architectures/attachments/web-clicks-website.zip deleted file mode 100644 index 5f83138..0000000 Binary files a/modules/use-cases-architectures/attachments/web-clicks-website.zip and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image1.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image1.png deleted file mode 100644 index a80f710..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image1.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image10.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image10.png deleted file mode 100644 index 4de5006..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image10.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image12.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image12.png deleted file mode 100644 index 8df5e14..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image12.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image14.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image14.png deleted file mode 100644 index 6376d82..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image14.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image16.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image16.png deleted file mode 100644 index 22c2be6..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image16.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image17.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image17.png deleted file mode 100644 index 89f8d08..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image17.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image19.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image19.png deleted file mode 100644 index 075b29f..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image19.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image23.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image23.png deleted file mode 100644 index d5d59c2..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image23.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image25.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image25.png deleted file mode 100644 index 8412bf5..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image25.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image26.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image26.png deleted file mode 100644 index 96d86dc..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image26.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image27.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image27.png deleted file mode 100644 index 3155a67..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image27.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image29.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image29.png deleted file mode 100644 index 5fb2a20..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image29.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image3.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image3.png deleted file mode 100644 index c3ea2b5..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image3.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image32.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image32.png deleted file mode 100644 index 35843ed..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image32.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image33.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image33.png deleted file mode 100644 index 262aa97..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image33.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image35.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image35.png deleted file mode 100644 index 779ef53..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image35.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image4.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image4.png deleted file mode 100644 index 3adca04..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image4.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image5.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image5.png deleted file mode 100644 index a189b5d..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image5.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image6.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image6.png deleted file mode 100644 index e67dd09..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image6.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image7.png b/modules/use-cases-architectures/images/decodable-data-pipeline/02/image7.png deleted file mode 100644 index 9a22fe1..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/02/image7.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image1.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image1.png deleted file mode 100644 index 80e216a..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image1.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image10.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image10.png deleted file mode 100644 index 14a6f39..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image10.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image2.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image2.png deleted file mode 100644 index 911ffe8..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image2.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image3.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image3.png deleted file mode 100644 index f8f88c8..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image3.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image4.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image4.png deleted file mode 100644 index d1cde2b..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image4.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image5.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image5.png deleted file mode 100644 index 7cc14ca..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image5.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image6.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image6.png deleted file mode 100644 index dbe6c14..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image6.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image9.png b/modules/use-cases-architectures/images/decodable-data-pipeline/03/image9.png deleted file mode 100644 index 4313244..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/03/image9.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/04/image1.png b/modules/use-cases-architectures/images/decodable-data-pipeline/04/image1.png deleted file mode 100644 index c7be7c3..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/04/image1.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/04/image2.png b/modules/use-cases-architectures/images/decodable-data-pipeline/04/image2.png deleted file mode 100644 index 2a8965d..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/04/image2.png and /dev/null differ diff --git a/modules/use-cases-architectures/images/decodable-data-pipeline/real-time-data-pipeline.png b/modules/use-cases-architectures/images/decodable-data-pipeline/real-time-data-pipeline.png deleted file mode 100644 index 0bf94a1..0000000 Binary files a/modules/use-cases-architectures/images/decodable-data-pipeline/real-time-data-pipeline.png and /dev/null differ diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc deleted file mode 100644 index 0acbedc..0000000 --- a/modules/use-cases-architectures/pages/real-time-data-pipeline/01-create-astra-objects.adoc +++ /dev/null @@ -1,184 +0,0 @@ -= Real-time data pipeline {product-short} objects -:navtitle: 1. {product-short} objects - -[NOTE] -==== -This guide is part of a series that creates a real-time data pipeline with {product} and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here]. -==== - -== Creating message topics to capture the stream of click data - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], select *Streaming*, and then click **Create tenant**. - - -. Name the new streaming tenant `webstore-clicks`, select any cloud provider and region, and then click **Create tenant**. - -. From your tenant's overview page, click the **Namespace and Topics** tab. - -. Create a new namespace with the name `production`. -+ -In this example, namespaces represent logical development environments to illustrate how you could create a continuous delivery flow. -You could also have namespaces for `development` and `staging`. - -. Click **Add Topic** next to your new `production` namespace, name the topic `all-clicks`, make sure **Persistent** is selected, and then click **Add Topic**. - -. Create another topic in the `production` namespace, name the topic `product-clicks`, make sure **Persistent** is selected, and then click **Add Topic**. - -You now have a `production` namespace with two topics, as well as the `default` namespace that is automatically created by {pulsar-short} whenever you create a streaming tenant. - -== Storing the stream of click data - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *{product-short}*. - -. Click **Create database**, and then complete the fields as follows: -+ -* **Type**: Select **Serverless (non-vector)** to follow along with this tutorial. -+ -If you select **Serverless (vector)**, you must modify the tutorial to use the `default_keyspace` keyspace or create the tutorial keyspace after you create your database. - -* **Database name**: Enter `webstore-clicks`. -* **Keyspace name**: Enter `click_data`. -* **Provider** and **Region**: Select the same cloud provider and region as your streaming tenant. - -. Click **Create Database**, and then wait for the database to initialize. -This can take several minutes. - -. From your database's overview page, click **CQL Console**, and then wait for `cqlsh` to start. - -. Enter the following CQL statement into the CQL console, and then press kbd:[Enter]. -+ -This statement creates a table named `all_clicks` in the `click_data` keyspace that will store all unfiltered web click data. -+ -[source, sql] ----- -CREATE TABLE IF NOT EXISTS click_data.all_clicks ( - click_timestamp bigint, - url_host text, - url_protocol text, - url_path text, - url_query text, - browser_type text, - operating_system text, - visitor_id uuid, - PRIMARY KEY ((operating_system, browser_type, url_host, url_path), click_timestamp) -); ----- - -. Run the following command in the CQL console to create another table that will store filtered web click data for product clicks only. -+ -[source, sql] ----- -CREATE TABLE click_data.product_clicks ( - catalog_area_name text, - product_name text, - click_timestamp timestamp, - PRIMARY KEY ((catalog_area_name), product_name, click_timestamp) -) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC); ----- - -. To verify that the tables were created, run `describe click_data;`. -+ -The console prints create statements describing the keyspace itself and the two tables. -+ -.Result -[%collapsible] -==== -[source,sql,subs="attributes+"] ----- -token@cqlsh> describe click_data; - -CREATE KEYSPACE click_data WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1': '3'} AND durable_writes = true; - -CREATE TABLE click_data.all_clicks ( - operating_system text, - browser_type text, - url_host text, - url_path text, - click_timestamp bigint, - url_protocol text, - url_query text, - visitor_id uuid, - PRIMARY KEY ((operating_system, browser_type, url_host, url_path), click_timestamp) -) WITH CLUSTERING ORDER BY (click_timestamp ASC) - AND additional_write_policy = '99PERCENTILE' - AND bloom_filter_fp_chance = 0.01 - AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} - AND comment = '' - AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'} - AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} - AND crc_check_chance = 1.0 - AND default_time_to_live = 0 - AND gc_grace_seconds = 864000 - AND max_index_interval = 2048 - AND memtable_flush_period_in_ms = 0 - AND min_index_interval = 128 - AND read_repair = 'BLOCKING' - AND speculative_retry = '99PERCENTILE'; - -CREATE TABLE click_data.product_clicks ( - catalog_area_name text, - product_name text, - click_timestamp timestamp, - PRIMARY KEY (catalog_area_name, product_name, click_timestamp) -) WITH CLUSTERING ORDER BY (product_name ASC, click_timestamp DESC) - AND additional_write_policy = '99PERCENTILE' - AND bloom_filter_fp_chance = 0.01 - AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} - AND comment = '' - AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'} - AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} - AND crc_check_chance = 1.0 - AND default_time_to_live = 0 - AND gc_grace_seconds = 864000 - AND max_index_interval = 2048 - AND memtable_flush_period_in_ms = 0 - AND min_index_interval = 128 - AND read_repair = 'BLOCKING' - AND speculative_retry = '99PERCENTILE'; ----- -==== - -== Connecting the topics to the store - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. - -. Click your `webstore-clicks` streaming tenant. - -. Click the **Sinks** tab, click **Create Sink**, and then complete the fields as follows: -+ -* **Namespace**: Select your `production` namespace. -* **Sink Type**: Select **{astra-db}**. -* **Name**: Enter `all-clicks`. -* **Input topic**: Select your `all-clicks` topic in your `production` namespace. -* **Database**: Select your `webstore-clicks` database. -* **Token**: Click the link to create an {product-short} application token with the **Organization Administrator** role, and then enter the token in the sink's **Token** field. -Store the token securely, you will use it multiple times during this tutorial. -* **Keyspace**: Enter `click_data`. -* **Table Name**: Enter `all_clicks`. -* **Mapping**: Use the default mapping, which maps the topic's fields to the table's columns. - -. Click **Create**, and then wait for the sink to initialize. -+ -When the sink is ready, its status changes to **Running**. - -. Create another sink with the following configuration: -+ -* **Namespace**: Select your `production` namespace. -* **Sink Type**: Select **{astra-db}**. -* **Name**: Enter `prd-click-astradb`. -* **Input topic**: Select your `product-clicks` topic in your `production` namespace. -* **Database**: Select your `webstore-clicks` database. -* **Token**: Use the same token that you created for the other sink. -* **Keyspace**: Enter `click_data`. -* **Table Name**: Enter `product_clicks`. -* **Mapping**: Use the default mapping, which maps the topic's fields to the table's columns. - -After the second sink initializes, you have two running sinks. - -To debug a sink, you can view the sink's logs in the {astra-ui}. -To do this, click the sink name, and then scroll to terminal output area on the sink's overview page. -The deployment logs are printed in this terminal output area, including semi-verbose `starting`, `validating`, and `running` logs. - -== Next step - -Now that you created the required {product-short} objects, you can xref:real-time-data-pipeline/02-create-decodable-objects.adoc[set up the Decodable processing]. \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc deleted file mode 100644 index 9ad38f1..0000000 --- a/modules/use-cases-architectures/pages/real-time-data-pipeline/02-create-decodable-objects.adoc +++ /dev/null @@ -1,273 +0,0 @@ -= Real-time data pipeline Decodable objects -:navtitle: 2. Decodable objects - -[NOTE] -==== -This guide is part of a series that creates a real-time data pipeline with {product} and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here]. -==== - -== The {product} connection info - -To connect {product} to Decodable, you need some information from your {product} tenant. - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], select *Streaming*, and then click your `webstore-clicks` streaming tenant. -+ -image:decodable-data-pipeline/02/image4.png[] - -. Click the **Connect** tab, and then scroll to the **Tenant Details** section. -+ -These values are required to connect to Decodable. -+ -image:decodable-data-pipeline/02/image16.png[] - -. Create a {pulsar-short} token: -+ -.. In **Tenant Details**, click **Token Manager**, and then click **Create Token**. -.. In the **Copy Token** dialog, copy the **Token**, and then store it securely. -.. Click **Close** when you are done. - -== Creating a Decodable connection to {product} for all web clicks - -In Decodable, you must create a connection and stream that will direct all web clicks to the correct topics in {product}. - -. In new browser tab, sign in to your Decodable account. - -. Click the **Connections** tab, and then click **New Connection**. -+ -image:decodable-data-pipeline/02/image25.png[] - -. In the **Choose a Connector** dialog, find the {company} {product} connector, and then click **Connect**. -+ -image:decodable-data-pipeline/02/image14.png[] - -. Use the {product} tenant details from your other browser tab to complete the Decodable connection fields: -+ -* **Connection Type**: Select **Sink**. -* **Broker Service URL**: Enter the Pulsar broker service URL from your {product} tenant. -* **Web Service URL**: Enter the Pulsar web service URL from your {product} tenant. -* **Topic**: Enter `persistent://webstore-clicks/production/all-clicks`. -* **Authentication Token**: Enter the same token you used for your {product} sinks. -* **Value Format**: Select **JSON**. -+ -image:decodable-data-pipeline/02/image35.png[] - -. Click **Next**, click **New Stream**, name the stream `Webstore-Normalized-Clicks-Stream`, and then click **Next**. -+ -image:decodable-data-pipeline/02/image12.png[] - -. In **Define this Connection's schema**, select **New Schema** for the **Schema Source**, and then add fields with the following names and types: -+ -[cols=2] -|=== -|Name |Type - -|`click_timestamp` -|**TIMESTAMP(0)** - -|`url_host` -|**STRING** - -|`url_protocol` -|**STRING** - -|`url_path` -|**STRING** - -|`url_query` -|**STRING** - -|`browser_type` -|**STRING** - -|`operating_system` -|**STRING** - -|`visitor_id` -|**STRING** -|=== -+ -image:decodable-data-pipeline/02/image10.png[] -+ -For **Type**, you must select options from the dropdown menu in order for Decodable to accept the schema. - -. Click **Next**, name the connection `Astra-Streaming-All-Webclicks-Connection`, and then click **Create Connection**. -+ -image:decodable-data-pipeline/02/image26.png[] - -== Creating a Decodable connection to {product} for product web clicks - -Create another connection in Decodable to stream product clicks. - -. In Decodable, on the **Connections** tab, click **New Connection**. - -. In the **Choose a Connector** dialog, find the {company} {product} connector, and then click **Connect**. - -. Use the {product} tenant details from your other browser tab to complete the Decodable connection fields. -All values are the same as the other connection, except the **Topic**. -+ -* **Connection Type**: Select **Sink**. -* **Broker Service URL**: Enter the Pulsar broker service URL from your {product} tenant. -* **Web Service URL**: Enter the Pulsar web service URL from your {product} tenant. -* **Topic**: Enter `persistent://webstore-clicks/production/product-clicks`. -* **Authentication Token**: Enter the same token you used for your {product} sinks. -* **Value Format**: Select **JSON**. - -. Click **Next**, click **New Stream**, name the stream `Webstore-Product-Clicks-Stream`, and then click **Next**. - -. In **Define this Connection's schema**, select **New Schema** for the **Schema Source**, and then add fields with the following names and types: -+ -[cols=2] -|=== -|Name |Type - -|`click_timestamp` -|**TIMESTAMP(0)** - -|`catalog_area_name` -|**STRING** - -|`product_name` -|**STRING** -|=== -+ -image:decodable-data-pipeline/02/image3.png[] - -. Click **Next**, name the connection `Astra-Streaming-Product-Webclicks-Connection`, and then click **Create Connection**. - -== Creating an HTTP data ingestion source - -Create a third connection to use Decodable's REST API to ingest (`POST`) raw data into the pipeline: - -. In Decodable, on the **Connections** tab, click **New Connection**. - -. In the **Choose a Connector** dialog, find the **REST** connector, and then click **Connect**. -+ -image:decodable-data-pipeline/02/image19.png[] - -. On the **Create your REST connector** dialog, leave the default values for all fields, and then click **Next**. -+ -image:decodable-data-pipeline/02/image27.png[] - -. Click **New Stream**, enter the name `Webstore-Raw-Clicks-Stream`, and then click **Next**. -+ -image:decodable-data-pipeline/02/image1.png[] - -. In **Define this Connection's schema**, select **New Schema** for the **Schema Source**, and then add fields with the following names and types: -+ -[cols=2] -|=== -|Name |Type - -|`click_epoch` -|**BIGINT** - -|`UTC_offset` -|**INT** - -|`request_url` -|**STRING** - -|`browser_agent` -|**STRING** - -|`visitor_id` -|**STRING** -|=== -+ -image:decodable-data-pipeline/02/image6.png[] - -. Click **Next**, name the connection `Webstore-Raw-Clicks-Connection`, and then click **Create Connection**. -+ -image:decodable-data-pipeline/02/image29.png[] - -In your REST connector's settings, note that the **Endpoint** value contains a ``, which is a dynamic value that is generated when the connection is created. -Click the connector's **Details** tab to see the resolved endpoint path, such as `/v1alpha2/connections/7ef9055f/events`. -You will use this path with your account domain, such as `user.api.decodable.co` to create the full endpoint URL. -For more information about the REST connector, see the https://docs.decodable.co/docs/connector-reference-rest#endpoint-url[Decodable documentation]. -+ -image:decodable-data-pipeline/02/image7.png[] - -You now have three connectors ready to use in your streaming pipeline. - -image:decodable-data-pipeline/02/image5.png[] - -== Creating a data normalization pipeline - -In this part of the tutorial, you will create the core functions for your stream processing pipeline. - -. In Decodable, go to **Pipelines**, and then click **Create Pipeline**. - -. For the input stream, select **Webstore-Raw-Clicks-Stream**, and then click **Next**. - -. In **Define your data processing with SQL**, delete the pre-populated SQL, and then enter the following SQL statement: -+ -[source,sql] ----- -insert into `Webstore-Normalized-Clicks-Stream` -select - CURRENT_TIMESTAMP as click_timestamp - , PARSE_URL(request_url, 'HOST') as url_host - , PARSE_URL(request_url, 'PROTOCOL') as url_protocol - , PARSE_URL(request_url, 'PATH') as url_path - , PARSE_URL(request_url, 'QUERY') as url_query - , REGEXP_EXTRACT(browser_agent,'(MSIE|Trident|(?!Gecko.+)Firefox|(?!AppleWebKit.+Chrome.+)Safari(?!.+Edge)|(?!AppleWebKit.+)Chrome(?!.+Edge)|(?!AppleWebKit.+Chrome.+Safari.+)Edge|AppleWebKit(?!.+Chrome|.+Safari)|Gecko(?!.+Firefox))(?: |\/)([\d\.apre]+)') as browser_type - , CASE - WHEN (browser_agent like '%Win64%') THEN 'Windows' - WHEN (browser_agent like '%Mac%') THEN 'Macintosh' - WHEN (browser_agent like '%Linux%') THEN 'Linux' - WHEN (browser_agent like '%iPhone%') THEN 'iPhone' - WHEN (browser_agent like '%Android%') THEN 'Android' - ELSE 'unknown' - END as operating_system - , visitor_id as visitor_id -from `Webstore-Raw-Clicks-Stream` ----- -+ -image:decodable-data-pipeline/02/image17.png[] - -. Click **Next**, review the automatically generated output stream, and then click **Next**. -+ -The output stream should be correct by default if you followed along with the tutorial so far. -+ -image:decodable-data-pipeline/02/image23.png[] - -. Click **Next**, name the pipeline `Webstore-Raw-Clicks-Normalize-Pipeline`, and then click **Create Pipeline**. -+ -It can take a few minutes for the pipeline to be created. - -== Creating a data filtering pipeline - -Create a pipeline to separate product click data from all web click data: - -. In Decodable, go to **Pipelines**, and then click **Create Pipeline**. - -. For the input stream, select **Webstore-Normalized-Clicks-Stream**, and then click **Next**. - -. In **Define your data processing with SQL**, delete the pre-populated SQL, and then enter the following SQL statement: -+ -[source,sql] ----- -insert into `Webstore-Product-Clicks-Stream` -select - click_timestamp - , TRIM(REPLACE(SPLIT_INDEX(url_path, '/', 2),'-',' ')) as catalog_area_name - , TRIM(REPLACE(SPLIT_INDEX(url_path, '/', 3),'-',' ')) as product_name -from `Webstore-Normalized-Clicks-Stream` -where TRIM(LOWER(SPLIT_INDEX(url_path, '/', 1))) = 'catalog' ----- -+ -image:decodable-data-pipeline/02/image33.png[] - -. Click **Next**, review the automatically generated output stream, and then click **Next**. -+ -The output stream should be correct by default if you followed along with the tutorial so far. -+ -image:decodable-data-pipeline/02/image32.png[] - -. Click **Next**, name the pipeline `Webstore-Product-Clicks-Pipeline`, and then click **Create Pipeline**. -+ -It can take a few minutes for the pipeline to be created. - -== Next step - -Next, xref:real-time-data-pipeline/03-put-it-all-together.adoc[connect the {product-short} and Decodable pieces, and then run the pipeline]. \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc deleted file mode 100644 index af86d5a..0000000 --- a/modules/use-cases-architectures/pages/real-time-data-pipeline/03-put-it-all-together.adoc +++ /dev/null @@ -1,196 +0,0 @@ -= Putting the real-time data pipeline to work -:navtitle: 3. Put it all together - -[NOTE] -==== -This guide is part of a series that creates a real-time data pipeline with {product} and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here]. -==== - -Now that we have all the pieces of our data processing pipeline in place, it’s time to start the connection and pipelines up and input some test data. - -== Starting the processing - -. Navigate to the “Connections” area and click the three dots at the right for each connection. -Click the “Start” option on all 3 connections. -+ -image:decodable-data-pipeline/03/image9.png[] - -. Be patient. -It might take a minute or so, but each connection should refresh with a state of “Running”. -+ -image:decodable-data-pipeline/03/image1.png[] -+ -TIP: If one of the connections has an issue starting up (like an incorrect setting or expired token), click on that connection for more information. - -. Navigate to the “Pipelines” area and use the same three-dot menu on each pipeline to start. -As with the connections, they might take a minute or so to get going. -Grab a coffee while you wait - you’ve earned it. -+ -image:decodable-data-pipeline/03/image3.png[] - -Before ingesting data, let’s make sure we have all the pieces in order... - -* REST connection running? **CHECK!** -* {product} connections running? **CHECK!** -* Normalization pipeline running? **CHECK!** -* Product clicks filter pipeline running? **CHECK!** - -== Your first ingested data - -. Navigate to the “Connections” area and click the “REST” connector. - -. Choose the “Upload” tab and copy/paste the following web click data into the window. -+ -[source,json] ----- -[ - { - "click_epoch": 1655303592179, - "request_url": "https://somedomain.com/catalog/area1/yetanother-cool-product?a=b&c=d", - "visitor_id": "b56afbf3-321f-49c1-919c-b2ea3e550b07", - "UTC_offset": -4, - "browser_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36" - } -] ----- - -. Click “Upload” to simulate data being posted to the endpoint. You will receive a confirmation that data has been received. - -No, this was not the big moment with cheers and balloons - the celebration is at the end of the next area. - -== Following the flow - -For this first record of data, let’s look at each step along the way and confirm processing is working. - -. After the data was ingested, the “Webstore-Raw-Clicks-Normalize-Pipeline” received it. -You can confirm this by inspecting the “Webstore-Raw-Clicks-Normalize-Pipeline” pipeline metrics. -The “Input Metrics” and "Output Metrics" areas report that one record has been received. -This confirms that the data passed successfully through this pipeline. -+ -image:decodable-data-pipeline/03/image2.png[] - -. In the “Connections” area, click the “Astra-Streaming-All-Webclicks-Connector”. -In “Input Metrics”, we see that 1 record has been received. -+ -image:decodable-data-pipeline/03/image4.png[] - -. Return to your “webstore-clicks” tenant and navigate to the “Namespace and Topics” area. -Expand the “production” namespace and select the “all-clicks” topic. -Confirm that “Data In” has 1 message and “Data Out” has 1 message. This means the topic took the data in and a consumer acknowledged receipt of the message. -+ -image:decodable-data-pipeline/03/image6.png[] - -. In the “Sinks” tab, select the “all-clicks” sink. In “Instance Stats” you see “Reads” has a value of 1 and “Writes” has a value of 1. This means the sink consumed a message from the topic and wrote the data to the store. -+ -image:decodable-data-pipeline/03/image5.png[] - -. Inspect the data in your {astra-db} database. -+ -In the {astra-ui}, go to your `webstore-clicks` database, click **CQL Console**, and then run the following CQL statement: -+ -[source,sql,subs="attributes+"] ----- -select * from click_data.all_clicks; ----- - -.Result -[%collapsible] -==== -[source,sql] ----- -token@cqlsh> EXPAND ON; //this cleans up the output -Now Expanded output is enabled -token@cqlsh> select * from click_data.all_clicks; -@ Row 1 -------------------+---------------------------------------- - operating_system | Windows - browser_type | Chrome/102.0.0.0 - url_host | somedomain.com - url_path | /catalog/area1/yetanother-cool-product - click_timestamp | 1675286722000 - url_protocol | https - url_query | a=b&c=d - visitor_id | b56afbf3-321f-49c1-919c-b2ea3e550b07 - -(1 rows) ----- -==== - -This confirms that the data was successfully written to the database. -Your pipeline ingested raw web click data, normalized it, and then persisted the parsed data to the database. - -== Follow the flow of the product clicks data - -Similar to how you followed the above flow of raw click data, follow this flow to confirm the filtered messages were stored. - -. In Decodable, go to your `Webstore-Product-Clicks-Pipeline` pipeline, and then check that the **Input Metrics** and the **Output Metrics** have 1 record each. - -. Go to your `Astra-Streaming-Product-Webclicks-Connection` connection, and then check that the **Input Metrics** have 1 record. - -Go to your `webstore-clicks` tenant, and then go to the `product-clicks` topic in the `production` namespace. - -. Make sure **Data In** and **Data Out** have 1 message each. - -. In the {astra-ui}, go to your `webstore-clicks` database, click **CQL Console**, and then query the `product_clicks` table to inspect the data in the database: -+ -[source,sql,subs="attributes+"] ----- -select * from click_data.product_clicks; ----- -+ -.Result -[%collapsible] -==== -[source,sql] ----- -@ Row 1 --------------------+--------------------------------- - catalog_area_name | area1 - product_name | yetanother cool product - click_timestamp | 2023-02-01 21:25:22.000000+0000 ----- -==== - -The first web click data you entered was a product click, so the data was filtered in the pipeline, and then processed into the relevant table. - -== Example real-time site data - -To simulate a production workload to test the pipeline, you need a way to continuously post data to the Decodable REST endpoint. -For this tutorial, you can use the following sample website: - -. Download the sample `xref:attachment$web-clicks-website.zip[web-click-website.zip]`, which is a static HTML e-commerce catalog that silently posts click data to an endpoint. -The sample site is a copy of the https://www.demoblaze.com/[Demoblaze site] from https://www.blazemeter.com/[BlazeMeter]. - -. Extract the zip, open the folder in an IDE or text editor, and then open `script.js`. - -. Replace the `decodable_token` and `endpoint_url` values with actual values from your Decodable account: -+ -[source,javascript] ----- -function post_click(url){ - let decodable_token = "access token: "; - let endpoint_url = "https://ddieruf.api.decodable.co/v1alpha2/connections/4f003544/events"; - ... -} ----- -+ -Replace the following: -+ -* ``: The value of `access_token` from your `.decodable/auth` file -* `https://ddieruf.api.decodable.co/v1alpha2/connections/4f003544/events`: Your REST connection's complete endpoint URL, including the generated endpoint path and your Decodable account's REST API base URL. -+ -For more information, see the https://docs.decodable.co/docs/connector-reference-rest#authentication[Decodable authentication documentation]. - -. Save and close `script.js`. - -. Open `phones.html` file in your browser as a local file, and then click on some products. -+ -Each click should send a `POST` request to your Decodable endpoint, which you can monitor in Decodable. -+ -image:decodable-data-pipeline/03/image10.png[] - -== Next step - -If the pipeline succeeded, you can clean up the resources created for this tutorial, as explained in xref:real-time-data-pipeline/04-debugging-and-clean-up.adoc[]. - -If the pipeline isn't working as expected, see the troubleshooting advice in xref:real-time-data-pipeline/04-debugging-and-clean-up.adoc[]. \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc deleted file mode 100644 index a51ccdf..0000000 --- a/modules/use-cases-architectures/pages/real-time-data-pipeline/04-debugging-and-clean-up.adoc +++ /dev/null @@ -1,60 +0,0 @@ -= Debugging and cleaning up the real-time data pipeline -:navtitle: 4. Debugging and cleanup - -[NOTE] -==== -This guide is part of a series that creates a real-time data pipeline with {product} and Decodable. For context and prerequisites, start xref:streaming-learning:use-cases-architectures:real-time-data-pipeline/index.adoc[here]. -==== - -== Debugging the pipeline - -If your pipeline didn't work as expected, there are a few things you can do to debug the issue. - -First, find where the pipeline broke by following the data flows described in xref:real-time-data-pipeline/03-put-it-all-together.adoc#following-the-flow[Follow the flow of the product data clicks]. - -With Decodable, you can click through the connections, streams, and pipelines and visualize where the connection failed. -For example, if you accidentally named a stream `click-stream` instead of `clicks-stream`, you can follow the click event to the outbound `clicks-stream` and then to the pipeline. -There, you'll see that the pipeline isn't receiving the message because its inbound stream is misnamed `click-stream`. -Then, you can stop the pipeline, fix the inbound stream name, and then restart the pipeline to get data flowing again. - -If that doesn't resolve the issue, you can test input data at the point of failure. -In this case, you want to determine if the input data is malformed, or if the object itself is failing. -There are two tools you can use to debug this at various stages of the pipleine: - -* In Decodable, you can use the **Preview** feature to see a sample of data processing in each pipeline. -* In {product}, each tenant has a **Try Me** feature where you can simulate producing and consuming messages in specific topics. - -== Clean up - -To clean up the resources created for this tutorial, you need to remove the Decodable and {product-short} objects you created. - -=== Removing Decodable objects - -. For each "Connection" click the 3 dots and choose "Stop". -+ -image:decodable-data-pipeline/04/image1.png["Stop decodable connection"] - -. Once "Stopped", click the 3 dots again and choose "Delete". This will remove the object from your account. -+ -image:decodable-data-pipeline/04/image2.png["Delete decodable connection"] - -. Repeat this for each "Connection" and "Pipeline". You can also remove the "Streams" if you prefer. - -=== Removing {product-short} objects - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. - -. Find the `webstore-clicks` tenant, click icon:ellipsis-vertical[name="More"], and then select **Delete**. -+ -This removes the tenant and all associated sinks, topics, messages, and namespaces. - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select **{product-short}**. - -. Find the `webstore-clicks` database, click icon:ellipsis-vertical[name="More"], and then select **Terminate**. -+ -This removes the database and all associated keyspaces and tables. -This can take a few minutes. - -== Next step - -Now that you've created a sample pipeline, try modifying the tutorial to create a pipeline for your own use cases, like fraud detection or removing personally identifiable information (PII). \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/real-time-data-pipeline/index.adoc b/modules/use-cases-architectures/pages/real-time-data-pipeline/index.adoc deleted file mode 100644 index a463acf..0000000 --- a/modules/use-cases-architectures/pages/real-time-data-pipeline/index.adoc +++ /dev/null @@ -1,77 +0,0 @@ -= Real-time data pipelines with {product-short} and Decodable -:navtitle: Build a real-time data pipeline - -This guide presents a hands-on approach for defining the objects that make up a real-time data processing pipeline. -You'll create and configure an {product} tenant and an {astra-db} database, connect them with data processing pipelines in Decodable, and send a single data record through to validate your real-time data pipeline. + -For extra credit, we'll demonstrate processing under load with a bulk of data. - -This guide uses the {astra-ui} and Decodable UI in your web browser, so no terminal or scripting is required! -You just need a safe place to temporarily store access tokens. - -== Architecture - -Before we get started on our journey, let’s discuss the objects we’re creating and why we need to create them. + -We want to build a pipeline that takes in raw web click data, breaks it into queryable values, saves the data, and filters for certain values. Both the parsed click data and the filtered data will be saved. We will use Decodable’s real-time stream processing (powered by Apache Flink) as well as {astra-db} and {product} (powered by {pulsar-reg} and {cass-reg}). -This pipeline is intended to be production ready, because we’re using cloud-based services that are automatically setting values for scaling, latency, and security. + - -The pipeline components are outlined below. - -image:decodable-data-pipeline/real-time-data-pipeline.png[Real-time data pipelines with {product} and Decodable] - -*E-Commerce Site Clicks* - -- Where the data comes from - -*{astra-db}* - -- All Clicks Topic: a collection of messages with normalized click data -- Product Clicks Topic: a collection of messages with normalized and filtered click data -- All Clicks Sink: a function that writes message data to a certain DB table -- Product Clicks Sink: a function that writes message data to a certain DB table -- {cass-short}: data store - -*Decodable* - -- HTTP Connection: a managed endpoint for posting click data -- Raw Click Stream: the flow of click data that other objects can “listen” to -- Click Normalization Pipeline: a SQL based pipeline that takes in raw click data, parses certain pieces, gives context to other data, and transforms some data -- All Clicks Stream: the flow of normalized click data that other objects can “listen” to -- {product} Connector: a sink objects can publish data to, which will be transformed into a {pulsar-short} message and produced to a given topic -- Product Clicks Pipeline: a SQL based pipeline that takes normalized data and filters for only clicks associated with a product -- Product Clicks Stream: the flow of filtered product click data that other objects can “listen” to - -== Prerequisites - -* An {astra-url}/signupstreaming[{product-short} account^] - -* A https://app.decodable.co/-/accounts/create[Decodable account] - -[NOTE] -==== -This guide stays within the free tiers of both {astra-db} and Decodable. -You won’t need a credit card for this guide. -==== - -== Getting started - -The guide is broken into a few milestones. You'll want to follow these milestones in order for everything to work. - -. xref:use-cases-architectures:real-time-data-pipeline/01-create-astra-objects.adoc[] -+ -In this guide, you will create a new streaming tenant in {product} with a namespace and topics. -Then, you’ll create an {astra-db} database, and hook the streaming topics and database together with a sink connector. - -. xref:use-cases-architectures:real-time-data-pipeline/02-create-decodable-objects.adoc[] -+ -In this guide, you will create pipelines for processing incoming data and connectors that bond a Decodable stream of data with the {product} topics created in step 1. - -. xref:use-cases-architectures:real-time-data-pipeline/03-put-it-all-together.adoc[] -+ -This is where the magic happens! -In this guide, you will start the processing pipelines, send a single record of data through them, and then validate everything happened as expected. -For extra credit, you are also given the opportunity to put the processing under load with a bulk of data. - -. xref:use-cases-architectures:real-time-data-pipeline/04-debugging-and-clean-up.adoc[] -+ -This final milestone helps with debugging the pipelines in case something doesn't go quite right. -You are also given instructions on how to tear down and clean up all the objects previously created, because we're all about being good citizens of the cloud.