Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion modules/functions/pages/astream-functions.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= {pulsar-reg} functions
:navtitle: {pulsar-short} functions
:page-tag: astra-streaming,dev,develop,pulsar,java,python

Functions are lightweight compute processes that enable you to process each message received on a topic.
You can apply custom logic to that message, transforming or enriching it, and then output it to a different topic.
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/cast.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Cast
:functionName: cast
:page-tag: cast, transform-function

The cast transform function transforms the data to a target compatible schema.

Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/compute.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Compute
:functionName: compute
:page-tag: compute, transform-function

The `compute` transform function computes field values based on an `expression` evaluated at runtime. +
If the field already exists, it will be overwritten. +
Expand Down
3 changes: 1 addition & 2 deletions modules/functions/pages/deploy-in-sink.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Deploy transform function in sink
:page-tag: cast, transform-function

As of https://www.datastax.com/products/luna-streaming[Luna Streaming] version 2.10.1.6, transform functions can be deployed inside of a sink process. +
As of https://www.ibm.com/docs/en/supportforpulsar[IBM Elite Support for Apache Pulsar (formerly Luna Streaming)] version 2.10.1.6, transform functions can be deployed inside of a sink process. +
Before this update, functions transformed data either after it was written to a topic by a source connector, or before it was read from a topic by a sink connector. +
This required either an intermediate topic, with additional storage, IO, and latency, or a custom connector. +
Now, functions can be deployed at sink creation and apply preprocessing to sink topic writes. +
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/drop-fields.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@

= Drop fields
:functionName: drop-fields
:page-tag: drop-fields, transform-function

The {functionName} transform function drops fields of structured data (Currently only AVRO is supported). +
The cast transform function transforms the data to a target compatible schema.
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/drop.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Drop
:functionName: drop
:page-tag: drop, transform-function

The {functionName} transform function drops a record from further processing. +
Use in conjunction with `when` to selectively drop records. +
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/flatten.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Flatten
:functionName: flatten
:page-tag: flatten, transform-function

The {functionName} transform function converts structured, nested data into a new single-hierarchy-level structured data. +
The names of the new fields are built by concatenating the intermediate level field names. +
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/merge-key-value.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Merge KeyValue
:functionName: merge-key-value
:page-tag: merge-key-value, transform-function

The {functionName} transform function merges the fields of KeyValue records where both the key and value are structured types of the same schema type. (Currently only AVRO is supported). +
The step name is `merge-key-value` and the `UserConfig` is controlled here: `{"steps": [{"type": "merge-key-value"}]}`.
Expand Down
1 change: 0 additions & 1 deletion modules/functions/pages/unwrap-key-value.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Unwrap KeyValue
:functionName: unwrap-key-value
:page-tag: unwrap-key-value, transform-function

If the record value is a KeyValue, the {functionName} transform function extracts the KeyValue's key or value and makes it the record value. +
The step name is `unwrap-key-value`, and the `UserConfig` is controlled here: `{"steps": [{"type": "unwrap-key-value"}]}`.
Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Connectors
:navtitle: Connector Overview
:page-tag: connectors,sinks,sources,astra-streaming,dev,develop,pulsar,go

{product} offers fully-managed {pulsar-reg} connectors.

Expand Down
5 changes: 2 additions & 3 deletions modules/pulsar-io/pages/connectors/sinks/astra-db.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
= {astra-db} ({cass-reg} enhanced)
:connectorName: astra-db-sink
:connectorType: astra-db
:page-tag: astra-db,cdc,sink-connector

{company} {astra-db} Sink Connector is based on the open-source https://docs.datastax.com/en/pulsar-connector/docs/index.html[{cass-reg} sink connector for {pulsar-reg}]. Depending on how you deploy the connector, it can be used to sink topic messages with a table in {astra-db} or a table in a {cass-short} cluster outside of DB.
{company} {astra-db} Sink Connector is based on the open-source xref:pulsar-connector:ROOT:index.adoc[{cass-reg} sink connector for {pulsar-reg}]. Depending on how you deploy the connector, it can be used to sink topic messages with a table in {astra-db} or a table in a {cass-short} cluster outside of DB.

The {product} portal provides simple way to connect this sink and a table in {astra-db} with simply a token. Using `pulsar-admin` or the REST API, you can configure the sink to connect with a {cass-short} connection manually.

Expand Down Expand Up @@ -140,7 +139,7 @@ These values are provided in the `auth` area of the preceding {cass-short} conne

These values are provided in the `topic` area of the preceding {cass-short} connection parameters.

Refer to the official documentation for a https://docs.datastax.com/en/pulsar-connector/docs/cfgRefPulsarDseConnection.html[connection properties reference].
Refer to the official documentation for a xref:pulsar-connector:ROOT:cfgRefPulsarDseConnection.adoc[connection properties reference].

=== Mapping topic data to table columns

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Cloud Storage
:connectorName: cloud-storage-sink
:connectorType: cloud-storage
:page-tag: cloud storage,sink-connector, aws, azure, gcp

Each public cloud has different ways of persisting data to their storage systems. Each cloud has their own way of formatting and storing the bytes. The Cloud Storage sink connector is a general interface to a chosen cloud storage, that exports data from a {pulsar-short} topic to the given system following a desired format.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Elasticsearch
:connectorName: es-sink
:connectorType: elastic-search
:page-tag: elasticsearch,sink-connector

Elasticsearch is the distributed, RESTful search and analytics engine at the heart of the Elastic Stack.

Expand Down
11 changes: 2 additions & 9 deletions modules/pulsar-io/pages/connectors/sinks/google-bigquery.adoc
Original file line number Diff line number Diff line change
@@ -1,19 +1,12 @@
= Google BigQuery
:connectorName: bigquery-sink
:connectorType: bigquery
:page-tag: bigquery,sink-connector


https://cloud.google.com/bigquery[Google BigQuery] is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.

BigQuery {pulsar-short} Sink is not integrated with BigQuery directly. It uses {pulsar-short}’s built-in https://pulsar.apache.org/docs/adaptors-kafka/[Kafka Connect adapter] library to transform message data into a Kafka compatible format. Then the https://docs.confluent.io/kafka-connectors/bigquery/current/kafka_connect_bigquery_config.html[Kafka Connect BigQuery Sink] is used as the actual BigQuery integration. The adaptor provides a flexible and extensible framework for data transformation and processing. It supports various data formats, including JSON, Avro, and Protobuf, and enables users to apply transformations on the data as it is being streamed from {pulsar-short}.

You will notice references to Kafka throughout the configuration. *You don’t need a running instance of Kafka to use this connector.* The Kafka references are used as "translation points” by this connector.
BigQuery {pulsar-short} Sink is not integrated with BigQuery directly. It uses {pulsar-short}'s built-in https://pulsar.apache.org/docs/adaptors-kafka/[Kafka Connect adapter] library to transform message data into a Kafka compatible format. Then the https://docs.confluent.io/kafka-connectors/bigquery/current/kafka_connect_bigquery_config.html[Kafka Connect BigQuery Sink] is used as the actual BigQuery integration. The adaptor provides a flexible and extensible framework for data transformation and processing. It supports various data formats, including JSON, Avro, and Protobuf, and enables users to apply transformations on the data as it is being streamed from {pulsar-short}.

[NOTE]
====
For more information on the Kafka Connect adapter, see https://www.datastax.com/blog/simplify-migrating-kafka-to-pulsar-kafka-connect-support[Simplify migrating from Kafka to {pulsar-short} with Kafka Connect Support] and the "https://medium.com/building-the-open-data-stack/datastax-presents-snowflake-sink-connector-for-apache-pulsar-53629b196064[{company} Snowflake Sink Connector for {pulsar-reg}].
====
You will notice references to Kafka throughout the configuration. *You don't need a running instance of Kafka to use this connector.* The Kafka references are used as "translation points" by this connector.

== Get Started

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= JDBC Clickhouse
:connectorName: jdbc-clickhouse
:connectorType: jdbc-clickhouse
:page-tag: jdbc,clickhouse,sink-connector

ClickHouse is an open-source column-oriented database management system for online analytical processing that allows users to generate analytical reports using SQL queries in
real-time.
Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sinks/jdbc-mariadb.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= JDBC MariaDB
:connectorName: jdbc-mariadb
:connectorType: jdbc-mariadb
:page-tag: jdbc,mariadb,sink-connector

MariaDB is the open source relational database loved by developers all over the world. Created by MySQL’s original developers, MariaDB is compatible with MySQL and guaranteed to stay open source forever. MariaDB powers some of the world’s most popular websites such as Wikipedia and WordPress.com. It is also the core engine behind banking, social media, mobile and e-commerce sites worldwide. MariaDB Connector/J is a Type 4 JDBC driver. It was developed specifically as a lightweight JDBC connector for use with MariaDB and MySQL database servers. It was originally based on the Drizzle JDBC code with numerous additions and bug fixes. Learn more about MariaDB on https://mariadb.org/[their site].

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= JDBC PostgreSQL
:connectorName: jdbc-postgres
:connectorType: jdbc-postgres
:page-tag: jdbc,postgres,sink-connector

PostgreSQL is a powerful, open source, object-relational database system with over 30 years of active development.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sinks/jdbc-sqllite.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= JDBC SQLite
:connectorName: jdbc-sqlite
:connectorType: jdbc-sqlite
:page-tag: jdbc,sqlite,sink-connector

SQLite is the most used database engine in the world.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sinks/kafka.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Kafka
:connectorName: kafka-sink
:connectorType: kafka
:page-tag: kafka,sink-connector

Apache Kafka(R) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sinks/kinesis.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Kinesis
:connectorName: kinesis-sink
:connectorType: kinesis
:page-tag: kinesis,sink-connector

Amazon Kinesis collects, processes, and analyzes real-time streaming data for timely insights and quick reactions to new information.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sinks/snowflake.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Snowflake
:connectorName: snowflake-sink
:connectorType: snowflake
:page-tag: snowflake,sink-connector

A Snowflake database is where an organization's uploaded structured and semi-structured data sets are held for processing and analysis.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Data Generator
:connectorName: data-gen-src
:connectorType: data-generator
:page-tag: data-generator,source-connector

The Data Generator source connector creates fake data on an {pulsar-reg} topic using the https://github.com/Codearte/jfairy[JFAIRY library] to generate a message containing "person" data.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Debezium MongoDB
:connectorName: debezium-mongo-src
:connectorType: debezium-mongodb
:page-tag: debezium,cdc,mongodb,source-connector

Debezium’s MongoDB connector tracks a MongoDB replica set or a MongoDB sharded cluster for document changes in databases and collections and records those changes as messages in an {pulsar-reg} topic.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Debezium MySQL
:connectorName: debezium-mysql-src
:connectorType: debezium-mysql
:page-tag: mysql,debezium,cdc,source-connector

The Debezium MySQL connector reads the binlog, produces change events for row-level INSERT, UPDATE, and DELETE operations, and emits these change events as messages in an {pulsar-reg} topic.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Debezium Oracle
:connectorName: debezium-orcl-src
:connectorType: debezium-oracle
:page-tag: oracle,debezium,cdc,source-connector

Debezium’s Oracle connector captures and records row-level changes that occur in databases on Oracle servers, including tables that are added while the connector is running.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Debezium PostgreSQL
:connectorName: debezium-pg-src
:connectorType: debezium-postgres
:page-tag: postgres,debezium,cdc,source-connector

The PostgreSQL connector produces a change event for every row-level insert, update, and delete operation that it captures, and sends change event records for each table in a separate {pulsar-reg} topic.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Debezium SQL Server
:connectorName: debezium-mssql-src
:connectorType: debezium-sqlserver
:page-tag: sql-server,cdc,debezium,source-connector

The Debezium SQL Server connector is based on the change data capture feature available in SQL Server 2016 Service Pack 1 (SP1) and later editions.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sources/kafka.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Kafka
:connectorName: kafka-src
:connectorType: kafka
:page-tag: kafka,source-connector

The Kafka source connector pulls data from a Kafka topic and persists the data into an {pulsar-reg} topic.

Expand Down
1 change: 0 additions & 1 deletion modules/pulsar-io/pages/connectors/sources/kinesis.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
= Kinesis
:connectorName: kinesis-src
:connectorType: kinesis
:page-tag: kinesis,source-connector

The Kinesis source connector pulls data from Amazon Kinesis and persists data into an {pulsar-reg} topic.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Exclusive subscriptions in {pulsar-reg}
:navtitle: Exclusive
:page-tag: pulsar-subscriptions,quickstart,admin,dev,pulsar

*Subscriptions* in {pulsar-reg} describe which consumers are consuming data from a topic and how they want to consume that data. +

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Failover subscriptions in {pulsar-reg}
:navtitle: Failover
:page-tag: pulsar-subscriptions,quickstart,admin,dev,pulsar

*Subscriptions* in {pulsar-reg} describe which consumers are consuming data from a topic and how they want to consume that data. +

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Key_Shared subscriptions in {pulsar-reg}
:navtitle: Key Shared
:page-tag: pulsar-subscriptions,quickstart,admin,dev,pulsar

*Subscriptions* in {pulsar-reg} describe which consumers are consuming data from a topic and how they want to consume that data. +

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Shared subscriptions in {pulsar-reg}
:navtitle: Shared
:page-tag: pulsar-subscriptions,quickstart,admin,dev,pulsar

*Subscriptions* in {pulsar-reg} describe which consumers are consuming data from a topic and how they want to consume that data. +

Expand Down
1 change: 0 additions & 1 deletion modules/subscriptions/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
= Subscriptions in {pulsar-reg}
:navtitle: Overview
:page-tag: pulsar-subscriptions,quickstart,admin,dev,pulsar

*Subscriptions* in {pulsar-reg} describe which consumers are consuming data from a topic and how they want to consume that data. +

Expand Down
2 changes: 1 addition & 1 deletion modules/subscriptions/partials/subscription-prereq.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To run this example, you'll need:

* https://openjdk.java.net/install/[Java OpenJDK 11]

* A configured {product} instance with at least *one streaming tenant* and *one topic*. See the https://docs.datastax.com/en/astra-streaming/docs/astream-quick-start.html[{product} quick start] for instructions.
* A configured {product} instance with at least one streaming tenant and one topic. See the xref:astra-streaming:getting-started:index.adoc[{product} quick start] for instructions.

* A local clone of the https://github.com/datastax/pulsar-subscription-example[{company} {pulsar-short} Subscription Example repository]

Expand Down
Binary file not shown.
Binary file not shown.
Binary file removed modules/use-cases-architectures/images/java-icon.png
Binary file not shown.
Binary file removed modules/use-cases-architectures/images/node-icon.png
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,21 @@ Below are example implementations for each runtime consuming messages from the C
While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific.
You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters.

* image:csharp-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[{csharp} CDC project example]
* image:golang-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example]
* image:java-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[Java CDC consumer example]
* image:node-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[Node.js CDC consumer example]
* image:python-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[Python CDC consumer example]
* svg:common::icons/logos/csharp.svg[role="icon text-xl",name="C#"] https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[{csharp} CDC project example]
* svg:common::icons/logos/go.svg[role="icon text-xl",name="Go"] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example]
* svg:common::icons/logos/java.svg[role="icon text-xl",name="Java"] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[Java CDC consumer example]
* svg:common::icons/logos/nodejs.svg[role="icon text-xl",name="Node.js"] https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[Node.js CDC consumer example]
* svg:common::icons/logos/python.svg[role="icon text-xl",name="Python"] https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[Python CDC consumer example]

== {pulsar-short} functions

It is very common to have a function consuming the CDC data. Functions usually perform additional processing on the data and pass it to another topic. Similar to a client consumer, it will need to deserialize the message data. Below are examples of different functions consuming messages from the CDC data topic.

While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific. You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters.

* image:golang-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example]
* image:java-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[Java CDC function example]
* image:python-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[Python CDC function example]
* svg:common::icons/logos/go.svg[role="icon text-xl",name="Go"] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example]
* svg:common::icons/logos/java.svg[role="icon text-xl",name="Java"] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[Java CDC function example]
* svg:common::icons/logos/python.svg[role="icon text-xl",name="Python"] https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[Python CDC function example]

== See also

Expand Down
Loading