diff --git a/modules/functions/examples/functions/deploy-sink-with-function.sh b/modules/functions/examples/functions/deploy-sink-with-function.sh deleted file mode 100644 index 3f26767..0000000 --- a/modules/functions/examples/functions/deploy-sink-with-function.sh +++ /dev/null @@ -1,8 +0,0 @@ -./bin/pulsar-admin sinks create \ ---sink-type \ ---inputs my-input-topic \ ---tenant public \ ---namespace default \ ---name my-sink \ ---transform-function "builtin://transforms" \ ---transform-function-config '{"steps": [{"type": "drop-fields", "fields": "password"}, {"type": "merge-key-value"}, {"type": "unwrap-key-value"}, {"type": "cast", "schema-type": "STRING"}]}' \ No newline at end of file diff --git a/modules/functions/examples/functions/pulsar-admin-create.sh b/modules/functions/examples/functions/pulsar-admin-create.sh deleted file mode 100644 index 2787e9e..0000000 --- a/modules/functions/examples/functions/pulsar-admin-create.sh +++ /dev/null @@ -1,9 +0,0 @@ -./bin/pulsar-admin sinks create --namespace ${namespace} --tenant ${TENANT} \ - --sink-type cassandra-enhanced \ - --name cassandra-sink \ - --inputs dbserver1.public.pulsar_source_table \ - --subs-position Earliest \ - --sink-config '{"verbose": true, "tasks.max":1, ...}' \ - --parallelism 1 \ - --transform-function builtin://transform \ - --transform-function-config `{"steps": [{"type": "flatten"}]}` \ No newline at end of file diff --git a/modules/functions/pages/astream-functions.adoc b/modules/functions/pages/astream-functions.adoc index 3323e74..ad5c3eb 100644 --- a/modules/functions/pages/astream-functions.adoc +++ b/modules/functions/pages/astream-functions.adoc @@ -1,12 +1,15 @@ -= {pulsar-reg} Functions -:navtitle: {pulsar-short} Functions += {pulsar-reg} functions +:navtitle: {pulsar-short} functions :page-tag: astra-streaming,dev,develop,pulsar,java,python -Functions are lightweight compute processes that enable you to process each message received on a topic. You can apply custom logic to that message, transforming or enriching it, and then output it to a different topic. +Functions are lightweight compute processes that enable you to process each message received on a topic. +You can apply custom logic to that message, transforming or enriching it, and then output it to a different topic. -Functions run inside {product} and are therefore serverless. You write the code for your function in Java, Python, or Go, then upload the code. It will be automatically run for each message published to the specified input topic. +Functions run inside {product} and are therefore serverless. +You write the code for your function in Java, Python, or Go, then upload the code. +It is automatically run for each message published to the specified input topic. -Functions are implemented using {pulsar-reg} functions. See https://pulsar.apache.org/docs/en/functions-overview/[{pulsar-short} Functions overview] for more information about {pulsar-short} functions. +Functions are implemented using https://pulsar.apache.org/docs/en/functions-overview/[{pulsar-reg} functions]. [IMPORTANT] ==== @@ -15,14 +18,34 @@ Custom functions require a xref:astra-streaming:operations:astream-pricing.adoc[ Organizations on the *Free* plan can use xref:functions:index.adoc[transform functions] only. ==== -== Deploy Python functions in a ZIP file +== Deploy Python functions in a zip file -{product} supports Python-based {pulsar-short} Functions. -These functions can be packaged in a ZIP file and deployed to {product} or {pulsar-short}. The same ZIP file can be deployed to either environment. -We’ll create a ZIP file in the proper format, then use the pulsar-admin command to deploy the ZIP. -We’ll pass a “create function" configuration file (a .yaml file) as a parameter to pulsar-admin, which defines the {pulsar-short} Function options and parameters. +{product} supports Python-based {pulsar-short} functions. +These functions can be packaged in a zip file and deployed to {product} or {pulsar-short}. +The same zip file can be deployed to either environment. -Assuming the ZIP file is named `testpythonfunction.zip`, an unzipped `testpythonfunction.zip` folder looks like this: +To demonstrate this, the following steps create function configuration YAML file, package all necessary function files as a zip archive, and then use the `pulsar-admin` CLI to deploy the zip. +The configuration file defines the {pulsar-short} function options and parameters. + +[TIP] +==== +For video demos of a {pulsar-short} Python function, see the *Five Minutes About {pulsar-short}* series provides: + +video::OCqxcNK0HEo[youtube, list=PL2g2h-wyI4SqeKH16czlcQ5x4Q_z-X7_m, height=445px,width=100%] +==== + +. Create a directory and subdirectories for your function zip archive with the following structure: ++ +[source,plain] +---- +/parent-directory + /python-code + /deps + /src +---- ++ +For example, a function called `my-python-function` could have the following structure: ++ [source,plain] ---- /my-python-function @@ -30,20 +53,27 @@ Assuming the ZIP file is named `testpythonfunction.zip`, an unzipped `testpython python-code/deps/sh-1.12.14-py2.py3-none-any.whl python-code/src/my-python-function.py ---- - -. To deploy a ZIP, first create the proper ZIP file directory structure. That file format/layout looks like this: + -[source, python] +The following commands create the necessary directories for a function called `demo-function`: ++ +[source,bash] +---- +mkdir demo-function +mkdir demo-function/python-code +mkdir demo-function/python-code/deps/ +mkdir demo-function/python-code/src/ ---- -mkdir my-python-function -mkdir my-python-function/python-code -mkdir my-python-function/python-code/deps/ -mkdir my-python-function/python-code/src/ -touch my-python-function/python-code/src/my-python-function.py +. Create a Python file in the `/src` directory. +For example: ++ +[source,bash] +---- +touch demo-function/python-code/src/demo-function.py ---- -. Add your code to my-python-function.py. For this example, we'll just use a basic exclamation function: +. Add your function code to your Python file. +This example function adds an exclamation point to the end of each message: + [source,python] ---- @@ -57,7 +87,8 @@ class ExclamationFunction(Function): return input + '!' ---- -. Add your dependencies to the /deps folder. For this example, we'll use the pulsar-client library. +. Add your function's dependencies to the `demo-function/python-code/deps` directory. +This example uses the `pulsar-client` library: + [source,bash] ---- @@ -65,156 +96,170 @@ cd deps pip install pulsar-client==2.10.0 ---- -. Run the following command to add my-pulsar-function.zip to the root of the file structure: +. Create the zip archive for your function in the `python-code` directory. ++ +For example, the following command is run from within the `/deps` directory and creates the `demo-function.zip` file in the parent `python-code` directory. + [source,bash] ---- cd deps -zip -r ../my-python-function.zip . - adding: sh-1.12.14-py2.py3-none-any.whl (deflated 2%) +zip -r ../demo-function.zip . ---- ++ +Wait while the archive is packaged. -. Ensure your package has the ZIP file at the root of the file structure: +. Verify that the zip file is in the `python-code` directory: + -[source,plain] +[source,bash] ---- python-code ls -al +---- ++ +.Result +[%collapsible] +==== +[source,console] +---- deps -my-python-function.zip +demo-function.zip src ---- +==== === Deploy a Python function with configuration file -. Create a deployment configuration file. In this example we'll call this file “func-create-config.yaml”. -This file will be passed to the pulsar-admin create function command. + -The contents of the YAML file should be: +. Create a deployment configuration file named `func-create-config.yaml` with the following contents. +This file is passed to the `pulsar-admin` create function command. + -[source,yaml] +[source,yaml,subs="+quotes"] ---- -py: +py: /absolute/path/to/demo-function.zip className: pythonfunc.ExclamationFunction parallelism: 1 inputs: - - persistent://mytenant/n0/t1 -output: persistent://mytenant/ns/t2 + - persistent://**TENANT_NAME**/**NAMESPACE_NAME**/**INPUT_TOPIC_NAME** +output: persistent://**TENANT_NAME**/**NAMESPACE_NAME**/**OUTPUT_TOPIC_NAME** autoAck: true -tenant: mytenant -namespace: ns0 -name: testpythonfunction +tenant: **TENANT_NAME** +namespace: **NAMESPACE_NAME** +name: demofunction logTopic: userConfig: logging_level: ERROR ---- ++ +Replace the following: ++ +* `**TENANT_NAME**`: The tenant where you want to deploy the function +* `**NAMESPACE_NAME**`: The namespace where you want to deploy the function +* `**INPUT_TOPIC_NAME**`: The input topic for the function +* `**OUTPUT_TOPIC_NAME**`: The output topic for the function -. Use pulsar-admin to deploy the Python ZIP to {product} or {pulsar-short}. -The command below assumes you've properly configured the client.conf file for pulsar-admin commands against your {pulsar-short} cluster. If you are using {product} see the documentation xref:astra-streaming:developing:configure-pulsar-env.adoc[here] for more information. +. Use `pulsar-admin` to deploy the Python zip to {product} or {pulsar-short}. +The command below assumes you've properly configured the `client.conf` file for `pulsar-admin` commands against your {pulsar-short} cluster. If you are using {product}, see xref:astra-streaming:developing:configure-pulsar-env.adoc[] for more information. + [source,console] ---- -bin/pulsar-admin functions create --function-config-file +bin/pulsar-admin functions create --function-config-file /absolute/path/to/func-create-config.yml ---- -. Check results: Go to the {astra-ui} to see your newly deployed function listed under the “Functions” tab for your Tenant. See <> for more information on testing and monitoring your function in {product}. - -.. You can also use the pulsar-admin command to list your functions: +. Verify that the function was deployed: + -[source,bash] +* Go to the {astra-ui} to see your newly deployed function listed under the **Functions** tab for your tenant. +See <> for more information on testing and monitoring your function in {product}. +* Use the `pulsar-admin` CLI to list functions for a specific tenant and namespace: ++ +[source,bash,subs="+quotes"] ---- -bin/pulsar-admin functions list --tenant --namespace +bin/pulsar-admin functions list --tenant **TENANT_NAME** --namespace **NAMESPACE_NAME** ---- == Deploy Java functions in a JAR file -{product} supports Java-based {pulsar-short} Functions which are packaged in a JAR file. -The JAR can be deployed to {product} or {pulsar-short}. The same JAR file can be deployed to either environment. +{product} supports Java-based {pulsar-short} functions which are packaged in a JAR file. +The JAR can be deployed to {product} or {pulsar-short}. +The same JAR file can be deployed to either environment. -We’ll create a JAR file using Maven, then use the pulsar-admin command to deploy the JAR. -We’ll pass a "create function" configuration file (a .yaml file) as a parameter to pulsar-admin, which defines the {pulsar-short} function options and parameters. +In this example, you'll create a function JAR file using Maven, then use the `pulsar-admin` CLI to deploy the JAR. +You'll also create a function configuration YAML file that defines the {pulsar-short} function options and parameters. -. To deploy a JAR, first create the proper JAR with the Java code of the {pulsar-short} Function. -An example pom.xml file is shown below: +. Create a properly-structured JAR with your function's Java code. +For example: + -.Function pom.xml +.Example: Function pom.xml [%collapsible] ==== -[source,pom] ----- - - - 4.0.0 - - java-function - java-function - 1.0-SNAPSHOT - - - - org.apache.pulsar - pulsar-functions-api - 3.0.0 - - - - - - - maven-assembly-plugin - - false - - jar-with-dependencies - - - - org.example.test.ExclamationFunction - - - - - - make-assembly - package - - assembly - - - - - - org.apache.maven.plugins - maven-compiler-plugin - 3.11.0 - - 17 - - - - - - +[source,xml] +---- + + + 4.0.0 + + java-function + java-function + 1.0-SNAPSHOT + + + + org.apache.pulsar + pulsar-functions-api + 3.0.0 + + + + + + + maven-assembly-plugin + + false + + jar-with-dependencies + + + + org.example.test.ExclamationFunction + + + + + + make-assembly + package + + assembly + + + + + + org.apache.maven.plugins + maven-compiler-plugin + 3.11.0 + + 17 + + + + + + ---- ==== -. Package the JAR file with Maven. -+ -[tabs] -==== -Maven:: +. Package the JAR file with Maven: + --- [source,bash] ---- mvn package ---- --- - -Result:: + --- -[source,bash] +.Result +[%collapsible] +==== +[source,console] ---- [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS @@ -223,16 +268,14 @@ Result:: [INFO] Finished at: 2023-05-16T16:19:05-04:00 [INFO] ------------------------------------------------------------------------ ---- --- ==== -. Create a deployment configuration file. In this example we'll call this file “func-create-config.yaml”. -This file will be passed to the pulsar-admin create function command. + -The contents of the YAML file should be: +. Create a deployment configuration file named `func-create-config.yaml` with the following contents. +This file is passed to the `pulsar-admin` create function command. + [source,yaml] ---- -jar: +jar: /absolute/path/to/java-function.jar className: com.example.pulsar.ExclamationFunction parallelism: 1 inputs: @@ -247,81 +290,100 @@ userConfig: logging_level: ERROR ---- + -[NOTE] +[IMPORTANT] ==== -{product} requires the “inputs” topic to have a message schema defined before deploying the function. Otherwise, deployment errors may occur. Use the {astra-ui} to define the message schema for a topic. +{product} requires the `inputs` topic to have a message schema defined before deploying the function. +Otherwise, deployment errors may occur. +Use the {astra-ui} to define the message schema for a topic. ==== + +. Use the `pulsar-admin` CLI to deploy your function JAR to {product} or {pulsar-short}. + -. Use pulsar-admin to deploy your new JAR to {product} or {pulsar-short}. -The command below assumes you've properly configured the client.conf file for pulsar-admin commands against your {pulsar-short} cluster. If you are using {product} see the documentation xref:astra-streaming:developing:configure-pulsar-env.adoc[here] for more information. +The following command assumes you've properly configured the `client.conf` file for `pulsar-admin` commands against your {pulsar-short} cluster. +If you are using {product}, see xref:astra-streaming:developing:configure-pulsar-env.adoc[] for more information. + [source,bash] ---- -bin/pulsar-admin functions create --function-config-file +bin/pulsar-admin functions create --function-config-file /absolute/path/to/func-create-config.yml ---- -. Check results: Go to the {astra-ui} to see your newly deployed function listed under the “Functions” tab for your Tenant. See <> for more information on testing and monitoring your function in {product}. - -.. You can also use the pulsar-admin command to list your functions: +. Verify that the function was deployed: + -[source,bash] +* Go to the {astra-ui} to see your newly deployed function listed under the **Functions** tab for your tenant. +See <> for more information on testing and monitoring your function in {product}. +* Use the `pulsar-admin` CLI to list functions for a specific tenant and namespace: ++ +[source,bash,subs="+quotes"] ---- -bin/pulsar-admin functions list --tenant --namespace +bin/pulsar-admin functions list --tenant **TENANT_NAME** --namespace **NAMESPACE_NAME** ---- == Add functions in {product} dashboard -Add functions in the Functions tab of the {product} dashboard. +Add functions in the **Functions** tab of the {product} dashboard. . Select *Create Function* to get started. + . Choose your function name and namespace. + image::astream-name-function.png[Function and Namespace] . Select the file you want to pull the function from and which function you want to use within that file. - -{product} generates a list of acceptable classes. Python and Java functions are added a little differently from each other. - -Python functions are added by loading a Python file (.py) or a zipped Python file (.zip). When adding Python files, the Class Name is specified as the name of the Python file without the extension plus the class you want to execute. - -For example, if the Python file is called `testfunction.py` and the class is `ExclamationFunction`, then the class name is `testfunction.ExclamationFunction`. The file can contain multiple classes, but only one is used. If there is no class in the Python file (when using a basic function, for example), specify the filename without the extension (ex. `function`). - -Java functions are added by loading a Java jar file (.jar). When adding Java files, you also need to specify the name of the class to execute as the function. - +{product} generates a list of acceptable classes. ++ image::astream-exclamation-function.png[Exclamation Function] -[start=4] -. Choose your input topics. ++ +There are differences depending on the function language: ++ +* Python functions are added by loading a Python file (`.py`) or a zipped Python file (`.zip`). ++ +When adding Python files, the Class Name is specified as the name of the Python file without the extension plus the class you want to execute. +For example, if the Python file is called `testfunction.py` and the class is `ExclamationFunction`, then the class name is `testfunction.ExclamationFunction`. ++ +The file can contain multiple classes, but only one is used. +If there is no class in the Python file (when using a basic function, for example), specify the filename without the extension, such as `testfunction`. ++ +* Java functions are added by loading a Java jar file (`.jar`). +When adding Java files, you must specify the name of the class to execute as the function. + +. Select your input topics. + image:streaming-learning:functions:astream-io-topics.png[IO Topics] -. Choose *Optional Destination Topics* for output and logging. +. Select **Optional Destination Topics** for output and logging. + image:streaming-learning:functions:astream-optional-destination-topics.png[Optional Topics] -. Choose *Advanced Options* and run at least one sink instance. +. If applicable, configure the *Advanced Options*. + image:streaming-learning:functions:astream-advanced-config.png[Advanced Configuration] -. Choose your *Processing Guarantee*. The default value is *ATLEAST_ONCE*. Processing Guarantee offers three options: +. Run at least one sink instance. + +. Select an option for *Processing Guarantee*: + -* *ATLEAST_ONCE*: Each message sent to the function can be processed more than once. +* *ATLEAST_ONCE* (default): Each message sent to the function can be processed more than once. * *ATMOST_ONCE*: The message sent to the function is processed at most once. Therefore, there is a chance that the message is not processed. * *EFFECTIVELY_ONCE*: Each message sent to the function will have one output associated with it. -. Provide an *Option Configuration Key*. See the https://pulsar.apache.org/functions-rest-api/#operation/registerFunction[{pulsar-short} Docs] for a list of configuration keys. +. Provide an *Option Configuration Key*. +See the https://pulsar.apache.org/functions-rest-api/#operation/registerFunction[{pulsar-short} documentation] for a list of configuration keys. + image:streaming-learning:functions:astream-provide-config-keys.png[Provide Config Key] -. Select *Create*. +. Click *Create*. -You have created a function for this namespace. You can confirm your function was created in the *Functions* tab. +. To verify that the function was created, review the list of functions on the *Functions* tab. == Add function with {pulsar-short} CLI -You can also add functions using the {pulsar-short} CLI. We will create a new Python function to consume a message from one topic, add an exclamation point, and publish the results to another topic. +You can add functions using the {pulsar-short} CLI. + +The following example creates a Python function that consumes a message from one topic, adds an exclamation point, and then publishes the results to another topic. -. Create the following Python function in `testfunction.py`: +. Add the following Python function code to a file named `testfunction.py`: + +.testfunction.py [source, python] ---- from pulsar import Function @@ -333,57 +395,71 @@ class ExclamationFunction(Function): def process(self, input, context): return input + '!' ---- -+ + . Deploy `testfunction.py` to your {pulsar-short} cluster using the {pulsar-short} CLI: + -[source, bash] +[source,bash,subs="+quotes"] ---- $ ./pulsar-admin functions create \ - --py /full/path/to/testfunction.py \ + --py /absolute/path/to/testfunction.py \ --classname testfunction.ExclamationFunction \ - --tenant \ + --tenant **TENANT_NAME** \ --namespace default \ --name exclamation \ --auto-ack true \ - --inputs persistent:///default/in \ - --output persistent:///default/out \ - --log-topic persistent:///default/log + --inputs persistent://**TENANT_NAME**/default/in \ + --output persistent://**TENANT_NAME**/default/out \ + --log-topic persistent://**TENANT_NAME**/default/log ---- + -A response of `Created Successfully!` indicates the function is deployed and ready to accept messages. +Replace **TENANT_NAME** with the name of the tenant where you want to deploy the function. +If you want to use a different namespace, replace `default` with another namespace name. +If you want to use different topics, change `in`, `out`, and `log` accordingly. + +. Verify that the response is `Created Successfully!`. +This indicates that the function was deployed and ready to run when triggered by incoming messages. + If the response is `402 Payment Required` with `Reason: only qualified organizations can create functions`, then you must upgrade to a xref:astra-streaming:operations:astream-pricing.adoc[paid {product} plan]. Organizations on the *Free* plan can use xref:functions:index.adoc[transform functions] only. - -. Use `./pulsar-admin functions list --tenant ` to list the functions in your tenant and confirm your new function was created. ++ +You can also verify that a function was created by checking the **Functions** tab or by running `./pulsar-admin functions list --tenant **TENANT_NAME**`. == Testing Your Function -Triggering a function is a convenient way to test that the function is working. When you trigger a function, you are publishing a message on the function’s input topic, which triggers the function to run. If the function has an output topic and the function returns data to the output topic, that data is displayed. +Triggering a function is a convenient way to test that the function is working. +When you trigger a function, you publish a message on the function's input topic, which triggers the function. + + + -Send a test value with {pulsar-short} CLI's `trigger` to test a function you've set up. . Listen for messages on the output topic: + -[source, bash] +[source,bash,subs="+quotes"] ---- -$ ./pulsar-client consume persistent:///default/ \ +$ ./pulsar-client consume persistent://**TENANT_NAME**/default/out \ --subscription-name my-subscription \ --num-messages 0 # Listen indefinitely ---- + -. Test your exclamation function with `trigger`: +Replace **TENANT_NAME** with the name of the tenant where you deployed the function. +If your function uses a different namespace and output topic name, replace `default` and `out` accordingly. ++ +If the function has an output topic, and the function returns data to the output topic, then that data is returned by the listener when you run the function. + +. Send a test value with the {pulsar-short} CLI `trigger` command: + -[source, bash] +[source,bash,subs="+quotes"] ---- $ ./pulsar-admin functions trigger \ --name exclamation \ - --tenant \ + --tenant **TENANT_NAME** \ --namespace default \ --trigger-value "Hello world" ---- + -The trigger sends the string `Hello world` to your exclamation function. Your function should output `Hello world!` to your consumed output. +This command sends the string `Hello world` to the exclamation function. +If deployed and configured correctly, the function should output `Hello world!` to the `out` topic. [#controlling-your-function] == Controlling Your Function @@ -394,7 +470,8 @@ image:streaming-learning:functions:astream-function-controls.png[Function Contro == Monitoring Your Function -Functions produce logs to help you in debugging. To view your function's logs, open your function in the *Functions* dashboard. +Functions produce logs to help you in debugging. +To view your function's logs, open your function in the *Functions* dashboard. image:streaming-learning:functions:astream-function-log.png[Function Log] @@ -402,7 +479,8 @@ In the upper right corner of the function log are controls to *Refresh*, *Copy t == Updating Your Function -A function that is already running can be updated with new configuration. The following settings can be updated: +A function that is already running can be updated with new configuration. +The following settings can be updated: * Function code * Output topic @@ -412,32 +490,30 @@ A function that is already running can be updated with new configuration. The fo If you need to update any other setting of the function, delete and then re-add the function. -To update your function, select your function in the *Functions* dashboard. +. To update your function, select the function in the *Functions* dashboard. image::astream-function-update.png[Update Function] -. Select *Change File* to find your function locally and click *Open*. +. Click *Change File* to select a local function file, and then click *Open*. -. Update your function's *Instances* and *Timeout*. When you're done, click *Update*. +. Update your function's *Instances* and *Timeout*. -. An *Updates Submitted Successfully* flag will appear to let you know your function has been updated. +. Click *Update*. ++ +An *Updates Submitted Successfully* message confirms that the function was updated. == Deleting Your Function -To delete a function, select the function to be deleted in the *Functions* dashboard. +. Select the function to be deleted in the *Functions* dashboard. image::astream-delete-function.png[Delete Function] . Click *Delete*. -. A popup will ask you to confirm deletion by entering the function's name and clicking *Delete*. -. A *Function-name Deleted Successfully!* flag will appear to let you know you've deleted your function. - -== {pulsar-short} functions video - -Follow along with this video from our *Five Minutes About {pulsar-short}* series to see a {pulsar-short} Python function in action. -video::OCqxcNK0HEo[youtube, list=PL2g2h-wyI4SqeKH16czlcQ5x4Q_z-X7_m, height=445px,width=100%] +. To confirm the deletion, enter the function's name, and then click *Delete*. ++ +A *Function-name Deleted Successfully!* message confirms the function was permanently deleted. -== Next +== Next steps Learn more about developing functions for {product} and {pulsar-short} https://pulsar.apache.org/docs/en/functions-develop/[here]. \ No newline at end of file diff --git a/modules/functions/pages/deploy-in-sink.adoc b/modules/functions/pages/deploy-in-sink.adoc index cadfbfa..48bd53b 100644 --- a/modules/functions/pages/deploy-in-sink.adoc +++ b/modules/functions/pages/deploy-in-sink.adoc @@ -11,33 +11,33 @@ Now, functions can be deployed at sink creation and apply preprocessing to sink Creating a sink function is similar to creating a sink in the {astra-ui}, but with a few additional steps. . xref:pulsar-io:connectors/index.adoc[Create a sink] as described in the {product} documentation. + . During sink creation, select the transform function you want to run inside the sink. + image::astream-transform-functions.png[Connect Topics] -. When the sink is up and running, inspect the sink connector's log. + + +. When the sink is up and running, inspect the sink connector's log. The function is loaded at sink creation: + -[source,shell] +[source,console] ---- -2022-11-14T15:01:02.398190413Z 2022-11-14T15:01:02,397+0000 [main] INFO org.apache.pulsar.functions.runtime.thread.ThreadRuntime - ThreadContainer starting function with instanceId 0 functionId f584ae69-2eda-449b-9759-2d19fd7c4da5 namespace astracdc +2022-11-14T15:01:02.398190413Z 2022-11-14T15:01:02,397+0000 [main] INFO org.apache.pulsar.functions.runtime.thread.ThreadRuntime - ThreadContainer starting function with instanceId 0 functionId f584ae69-2eda-449b-9759-2d19fd7c4da5 namespace astracdc ---- . The function then applies preprocessing to outgoing messages, in this case casting an AVRO record to `String` to your selected topic: + -[source,shell] +[source,json] ---- {{"field1": "value1", "field2": "value2"}} ---- == Create sink function in pulsar-admin -[NOTE] -==== https://github.com/datastax/pulsar[Luna Streaming 2.10+] is required to deploy custom functions in {pulsar-short}. -==== Create a sink connector, and include the path to the transform function and configuration at creation: -[source,shell,subs="attributes+"] + +[source,shell] ---- pulsar-admin sinks create \ --sink-type elastic-sink \ diff --git a/modules/functions/pages/index.adoc b/modules/functions/pages/index.adoc index 39daa7b..1f1d922 100644 --- a/modules/functions/pages/index.adoc +++ b/modules/functions/pages/index.adoc @@ -12,24 +12,201 @@ Unqualified orgs can use transform functions without upgrading to a Pay As You G [#transform-list] == Transforms -include::partial$/function-list.adoc[] + +* **Cast**: The xref:cast.adoc[cast transform function] modifies the key or value schema to a target compatible schema. +* **Compute**: The xref:compute.adoc[compute transform function] computes new field values based on an expression evaluated at runtime. If the field already exists, it will be overwritten. +* **Drop-fields**: The xref:drop-fields.adoc[drop-fields transform function] drops fields from structured data. +* **Drop**: The xref:drop.adoc[drop transform function] drops a record from further processing. +* **Flatten**: The xref:flatten.adoc[flatten transform function] flattens structured data. +* **Merge KeyValue**: The xref:merge-key-value.adoc[merge KeyValue transform function] merges the fields of KeyValue records where both the key and value are structured data with the same schema type. +* **Unwrap KeyValue**: The xref:unwrap-key-value.adoc[unwrap KeyValue transform function] extracts the KeyValue's key or value, and then makes it the record value if the record is a KeyValue. [#transform-config] == Configuration -include::partial$/configuration.adoc[] +The `TransformFunction` reads its configuration as `JSON` from the Function `userConfig` parameter in the format: + +[source,json] +---- +{ + "steps": [ + { + "type": "drop-fields", "fields": "keyField1,keyField2", "part": "key" + }, + { + "type": "merge-key-value" + }, + { + "type": "unwrap-key-value" + }, + { + "type": "cast", "schema-type": "STRING" + } + ] +} +---- + +Transform functions are performed in the order in which they appear in the `steps` array. +Each step is defined by its `type` and uses its own arguments. +Each step can be dynamically toggled on or off by supplying a `when` condition that evaluates to true or false. + +For example, if the previous configuration is applied to a `KeyValue` input record, the following transformed values are returned after each step: + +[source,avro] +---- +# Original input record +{key={keyField1: key1, keyField2: key2, keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}} + +# Transformations +(KeyValue) + | + | "type": "drop-fields", "fields": "keyField1,keyField2", "part": "key" + | +{key={keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue) + | + | "type": "merge-key-value" + | +{key={keyField3: key3}, value={keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue) + | + | "type": "unwrap-key-value" + | +{keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3} (AVRO) + | + | "type": "cast", "schema-type": "STRING" + | +{"keyField3": "key3", "valueField1": "value1", "valueField2": "value2", "valueField3": "value3"} (STRING) +---- [#deploy-cli] == Deploy with {pulsar-short} CLI -include::partial$/deploy-cli.adoc[] +https://github.com/datastax/pulsar[Luna Streaming 2.10+] is required to deploy custom functions in {pulsar-short}. + +The transform function `.nar` lives in the `/functions` directory of your {pulsar-short} deployment. + +[tabs] +====== +{pulsar-short} standalone:: ++ +-- +To deploy the built-in transform function locally in {pulsar-short} standalone, do the following: + +. Start {pulsar-short} standalone: ++ +[source,shell] +---- +./bin/pulsar standalone +---- + +. Create a transform function in `localrun` mode: ++ +[source,shell,subs="attributes+"] +---- +./bin/pulsar-admin functions localrun \ +--jar functions/pulsar-transformations-2.0.1.nar \ +--classname com.datastax.oss.pulsar.functions.transforms.TransformFunction \ +--inputs my-input-topic \ +--output my-output-topic \ +--user-config '{"steps": [{"type": "drop-fields", "fields": "password"}, {"type": "merge-key-value"}, {"type": "unwrap-key-value"}, {"type": "cast", "schema-type": "STRING"}]}' +---- +-- + +{pulsar-short} cluster:: ++ +-- +To deploy a built-in transform function to a {pulsar-short} cluster, do the following: + +. Create a built-in transform function with the {pulsar-short} CLI: ++ +---- +./bin/pulsar-admin functions create \ +--tenant $TENANT \ +--namespace $NAMESPACE \ +--name transform-function \ +--inputs persistent://$TENANT/$NAMESPACE/$INPUT_TOPIC +--output persistent://$TENANT/$NAMESPACE/$OUTPUT_TOPIC \ +--classname com.datastax.oss.pulsar.functions.transforms.TransformFunction \ +--jar functions/pulsar-transformations-2.0.1.nar +---- ++ +.Result +[%collapsible] +==== +[source,console] +---- +Created successfully +---- +==== + +. Confirm your function has been created: ++ +[source,shell] +---- +./bin/pulsar-admin functions list --tenant $TENANT +---- ++ +.Result +[%collapsible] +==== +[source,console] +---- +cast-function +flatten-function +transform-function +transform-function-2 +---- +==== +-- +====== [#deploy-as] == Deploy with {product} -include::partial$/deploy-as.adoc[] +Deploy transform functions in the *Functions* tab of the {astra-ui}. + +The process is similar to xref:astra-streaming:developing:astream-functions.adoc[creating a function in the {astra-ui}], but with a few additional steps. + +. After naming your new function, select the *Use {company} transform function* option. + +. Select a transform function from the list of available functions: ++ +image::astream-transform-functions.png[Connect Topics] + +. Select the transform function's namespace and input topic(s). + +. Select the transform function's namespace, output topic, and log topic. ++ +The log topic is a separate output topic for messages containing additional `loglevel`, `fqn`, and `instance` properties. + +. Specify advanced configuration options, if applicable. + +. Pass JSON configuration values with your function, if applicable. ++ +For more, see the transform function <> table. + +. Select *Create*. +The transform function will initialize and begin processing data changes. -== What's next? +. Confirm your function has been created with the {pulsar-short} CLI: ++ +[source,shell] +---- +./bin/pulsar-admin functions list --tenant $TENANT +---- ++ +.Result +[%collapsible] +==== +[source,console] +---- +cast-function +flatten-function +transform-function +transform-function-2 +---- +==== -For more, see xref:astra-streaming:developing:astream-functions.adoc[] or the https://pulsar.apache.org/docs/functions-overview[{pulsar-short} documentation]. +== See also +* xref:astra-streaming:developing:astream-functions.adoc[] +* https://pulsar.apache.org/docs/functions-overview[{pulsar-short} documentation] \ No newline at end of file diff --git a/modules/functions/partials/configuration.adoc b/modules/functions/partials/configuration.adoc deleted file mode 100644 index e793680..0000000 --- a/modules/functions/partials/configuration.adoc +++ /dev/null @@ -1,47 +0,0 @@ -The `TransformFunction` reads its configuration as `JSON` from the Function `userConfig` parameter in the format: - -[source,json] ----- -{ - "steps": [ - { - "type": "drop-fields", "fields": "keyField1,keyField2", "part": "key" - }, - { - "type": "merge-key-value" - }, - { - "type": "unwrap-key-value" - }, - { - "type": "cast", "schema-type": "STRING" - } - ] -} ----- - -Transform functions are performed in the order in which they appear in the `steps` array. -Each step is defined by its `type` and uses its own arguments. -Each step can be dynamically toggled on or off by supplying a `when` condition that evaluates to `true` or `false`. - -For example, the above configuration applied on a `KeyValue` input record with value `{key={keyField1: key1, keyField2: key2, keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}}` will return transformed values after each step: -[source,shell] ----- -{key={keyField1: key1, keyField2: key2, keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}}(KeyValue) - | - | ”type": "drop-fields", "fields": "keyField1,keyField2”, "part": "key” - | -{key={keyField3: key3}, value={valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue) - | - | "type": "merge-key-value" - | -{key={keyField3: key3}, value={keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3}} (KeyValue) - | - | "type": "unwrap-key-value" - | -{keyField3: key3, valueField1: value1, valueField2: value2, valueField3: value3} (AVRO) - | - | "type": "cast", "schema-type": "STRING" - | -{"keyField3": "key3", "valueField1": "value1", "valueField2": "value2", "valueField3": "value3"} (STRING) ----- \ No newline at end of file diff --git a/modules/functions/partials/deploy-as.adoc b/modules/functions/partials/deploy-as.adoc deleted file mode 100644 index 3b69f64..0000000 --- a/modules/functions/partials/deploy-as.adoc +++ /dev/null @@ -1,40 +0,0 @@ -Deploy transform functions in the *Functions* tab of the {astra-ui}. - -The process is similar to xref:astra-streaming:developing:astream-functions.adoc[creating a function in the {astra-ui}], but with a few additional steps. - -. After naming your new function, select the *Use {company} transform function* option. -. Select a transform function from the list of available functions: -+ -image::astream-transform-functions.png[Connect Topics] -. Select the transform function's namespace and input topic(s). -. Select the transform function's namespace, output topic, and log topic. -The *log topic* is a separate output topic for messages containing additional `loglevel`, `fqn`, and `instance` properties. -. Specify advanced configuration options. -. Pass JSON configuration values with your function, if applicable. -For more, see the transform function <> table. -. Select *Create*. The transform function will initialize and begin processing data changes. -. Confirm your function has been created with the {pulsar-short} CLI: -+ -[tabs] -==== -{pulsar-short} Admin:: -+ --- -[source,shell,subs="attributes+"] ----- -./bin/pulsar-admin functions list --tenant $TENANT ----- --- - -Result:: -+ --- -[source,shell,subs="attributes+"] ----- -cast-function -flatten-function -transform-function -transform-function-2 ----- --- -==== \ No newline at end of file diff --git a/modules/functions/partials/deploy-cli.adoc b/modules/functions/partials/deploy-cli.adoc deleted file mode 100644 index 6e79d32..0000000 --- a/modules/functions/partials/deploy-cli.adoc +++ /dev/null @@ -1,88 +0,0 @@ -[NOTE] -==== -https://github.com/datastax/pulsar[Luna Streaming 2.10+] is required to deploy custom functions in {pulsar-short}. -==== - -The transform function `.nar` lives in the `/functions` directory of your {pulsar-short} deployment. - -=== {pulsar-short} standalone - -To deploy the built-in transform function locally in {pulsar-short} standalone: - -. Start {pulsar-short} standalone: -+ -[source,shell] ----- -./bin/pulsar standalone ----- - -. Create a transform function in `localrun` mode: -+ -[source,shell,subs="attributes+"] ----- -./bin/pulsar-admin functions localrun \ ---jar functions/pulsar-transformations-2.0.1.nar \ ---classname com.datastax.oss.pulsar.functions.transforms.TransformFunction \ ---inputs my-input-topic \ ---output my-output-topic \ ---user-config '{"steps": [{"type": "drop-fields", "fields": "password"}, {"type": "merge-key-value"}, {"type": "unwrap-key-value"}, {"type": "cast", "schema-type": "STRING"}]}' ----- - -=== {pulsar-short} cluster - -To deploy a built-in transform function to a {pulsar-short} cluster: - -. Create a built-in transform function: -+ -[tabs] -==== -{pulsar-short} Admin:: -+ --- ----- -./bin/pulsar-admin functions create \ ---tenant $TENANT \ ---namespace $NAMESPACE \ ---name transform-function \ ---inputs persistent://$TENANT/$NAMESPACE/$INPUT_TOPIC ---output persistent://$TENANT/$NAMESPACE/$OUTPUT_TOPIC \ ---classname com.datastax.oss.pulsar.functions.transforms.TransformFunction \ ---jar functions/pulsar-transformations-2.0.1.nar ----- --- - -Result:: -+ --- -[source,shell,subs="attributes+"] ----- -Created successfully ----- --- -==== - -. Confirm your function has been created with the {pulsar-short} CLI: -+ -[tabs] -==== -{pulsar-short} Admin:: -+ --- -[source,shell,subs="attributes+"] ----- -./bin/pulsar-admin functions list --tenant $TENANT ----- --- - -Result:: -+ --- -[source,shell,subs="attributes+"] ----- -cast-function -flatten-function -transform-function -transform-function-2 ----- --- -==== \ No newline at end of file diff --git a/modules/functions/partials/function-list.adoc b/modules/functions/partials/function-list.adoc deleted file mode 100644 index dc9e705..0000000 --- a/modules/functions/partials/function-list.adoc +++ /dev/null @@ -1,49 +0,0 @@ -[#cast] -=== Cast - -The `cast` transform function modifies the key or value schema to a target compatible schema. - -xref:cast.adoc[Cast documentation]. - -[#compute] -=== Compute - -The `compute` transform function computes new field values based on an expression evaluated at runtime. If the field already exists, it will be overwritten. - -xref:compute.adoc[Compute documentation]. - -[#drop-fields] -=== Drop-fields - -The `drop-fields` transform function drops fields from structured data. - -xref:drop-fields.adoc[Drop-fields documentation]. - -[#drop] -=== Drop - -The `drop` transform function drops a record from further processing. - -xref:drop.adoc[Drop documentation]. - -[#flatten] -=== Flatten - -The `flatten` transform function flattens structured data. - -xref:flatten.adoc[Flatten documentation]. - -[#merge-key-value] -=== Merge KeyValue - -The `merge-key-value` transform function merges the fields of KeyValue records where both the key and value are structured data with the same schema type. - -xref:merge-key-value.adoc[Merge KeyValue documentation]. - -[#unwrap-key-value] -=== Unwrap key value - -The `unwrap-key-value` transform function extracts the KeyValue's key or value and makes it the record value (if the record is a KeyValue). - -xref:unwrap-key-value.adoc[Unwrap KeyValue documentation]. - diff --git a/modules/pulsar-io/examples/connectors/sinks/astra-db/auth.csv b/modules/pulsar-io/examples/connectors/sinks/astra-db/auth.csv deleted file mode 100644 index e786c76..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/astra-db/auth.csv +++ /dev/null @@ -1,5 +0,0 @@ -"Name","Required","Default","Description" -"gssapi","true","{""service"": ""dse""}","" -"password","true","","" -"provider","true","None","" -"username","true","token","" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/astra-db/config.csv b/modules/pulsar-io/examples/connectors/sinks/astra-db/config.csv deleted file mode 100644 index f4de9c7..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/astra-db/config.csv +++ /dev/null @@ -1,14 +0,0 @@ -"Name","Required","Default","Description" -"auth","true","{}","Refer to the auth properties ref" -"cloud.secureConnectBundle","true","","" -"compression","true","None","" -"connectionPoolLocalSize","true","4","" -"ignoreErrors","true","None","" -"jmx","true","true","" -"maxConcurrentRequests","true","500","" -"maxNumberOfRecordsInBatch","true","32","" -"queryExecutionTimeout","true","30","" -"task.max","true","1","" -"tasks.max","true","1","" -"topic","true","{}","Refer to the topic properties ref" -"topics","true","","The topic name to watch" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/bigquery/config.csv b/modules/pulsar-io/examples/connectors/sinks/bigquery/config.csv deleted file mode 100644 index 72cda9f..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/bigquery/config.csv +++ /dev/null @@ -1,15 +0,0 @@ -Name,Required,Default,Description -kafkaConnectorSinkClass,true,,"A Kafka-connector sink class to use. Unless you've developed your own, use the value ""com.wepay.kafka.connect.bigquery.BigQuerySinkConnector""." -offsetStorageTopic,true,,Pulsar topic to store offsets at. This is an additional topic to your topic with the actual data going to BigQuery. -sanitizeTopicName,true,,"Some connectors cannot handle Pulsar topic names like persistent://a/b/topic and do not sanitize the topic name themselves. If enabled, all non alpha-digital characters in topic name will be replaced with underscores. In some cases this may result in topic name collisions (topic_a and topic.a will become the same) - -This value MUST be set to `true`. Any other value will result in an error." -batchSize,false,16384,Size of messages in bytes the sink will attempt to batch messages together before flush. -collapsePartitionedTopics,false,false,Supply Kafka record with topic name without -partition- suffix for partitioned topics. -kafkaConnectorConfigProperties,false,{},A key/value map of config properties to pass to the Kafka connector. See the reference table below. -lingerTimeMs ,false,2147483647L,Time interval in milliseconds the sink will attempt to batch messages together before flush. -maxBatchBitsForOffset,false,12,Number of bits (0 to 20) to use for index of message in the batch for translation into an offset. 0 to disable this behavior (Messages from the same batch will have the same offset which can affect some connectors.) -topic,true,,The Kafka topic name that is passed to the Kafka sink. -unwrapKeyValueIfAvailable ,false,true,In case of Record> data use key from KeyValue<> instead of one from Record. -useIndexAsOffset,false,true,"Allows use of message index instead of message sequenceId as offset, if available. Requires AppendIndexMetadataInterceptor and exposingBrokerEntryMetadataToClientEnabled=true on brokers." -useOptionalPrimitives,false,false,"Pulsar schema does not contain information whether the Schema is optional, Kafka's does. This provides a way to force all primitive schemas to be optional for Kafka." \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/bigquery/kafkaConnectorConfigProperties.csv b/modules/pulsar-io/examples/connectors/sinks/bigquery/kafkaConnectorConfigProperties.csv deleted file mode 100644 index 1c27365..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/bigquery/kafkaConnectorConfigProperties.csv +++ /dev/null @@ -1,61 +0,0 @@ -Name,Required,Default,Description -allBQFieldsNullable,false,false,"If `true`, no fields in any produced BigQuery schema are REQUIRED. All non-nullable Avro fields are translated as NULLABLE (or REPEATED, if arrays)." -allowBigQueryRequiredFieldRelaxation,false,false,"If true, fields in BigQuery Schema can be changed from REQUIRED to NULLABLE." -allowNewBigQueryFields,false,false,"If true, new fields can be added to BigQuery tables during subsequent schema updates." -allowSchemaUnionization,false,false,"If true, the existing table schema (if one is present) will be unionized with new record schemas during schema updates. If false, the record of the last schema in a batch will be used for any necessary table creation and schema update attempts. - -Setting allowSchemaUnionization to false and allowNewBigQueryFields and allowBigQueryRequiredFieldRelaxation to true is equivalent to setting autoUpdateSchemas to true in older (pre-2.0.0) versions of this connector. - -In this case, if BigQuery raises a schema validation exception or a table doesn’t exist when a writing a batch, the connector will try to remediate by required field relaxation and/or adding new fields. - -If allowSchemaUnionization, allowNewBigQueryFields, and allowBigQueryRequiredFieldRelaxation are true, the connector will create or update tables with a schema whose fields are a union of the existing table schema’s fields and the ones present in all of the records of the current batch. - -The key difference is that with unionization disabled, new record schemas have to be a superset of the table schema in BigQuery. - -allowSchemaUnionization is a useful tool to make things work. For example, if you’d like to remove fields from data upstream, the updated schemas still work in the connector. It is similarly useful when different tasks see records whose schemas contain different fields that are not in the table. However, note with caution that if allowSchemaUnionization is set and some bad records are in the topic, the BigQuery schema may be permanently changed. This presents two issues: first, since BigQuery doesn’t allow columns to be dropped from tables, they’ll add unnecessary noise to the schema. Second, since BigQuery doesn’t allow column types to be modified, they could completely break pipelines down the road where well-behaved records have schemas whose field names overlap with the accidentally-added columns in the table, but use a different type." -autoCreateBucket,false,true,"Whether to automatically create the given bucket, if it does not exist." -autoCreateTables,false,false,Automatically create BigQuery tables if they don’t already exist -avroDataCacheSize,false,100,The size of the cache to use when converting schemas from Avro to Kafka Connect. -batchLoadIntervalSec,false,120,"The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if `enableBatchLoad` is configured." -bigQueryMessageTimePartitioning,false,false,Whether or not to use the message time when inserting records. Default uses the connector processing time. -bigQueryPartitionDecorator,false,true,Whether or not to append partition decorator to BigQuery table name when inserting records. Default is true. Setting this to true appends partition decorator to table name (e.g. table$yyyyMMdd depending on the configuration set for bigQueryPartitionDecorator). Setting this to false bypasses the logic to append the partition decorator and uses raw table name for inserts. -bigQueryRetry,false,0,The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error. -bigQueryRetryWait,false,1000,"The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error." -clusteringPartitionFieldNames,false,,Comma-separated list of fields where data is clustered in BigQuery. -convertDoubleSpecialValues,false,false,Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successfull delivery to BigQuery. -defaultDataset,true,,The default dataset to be used -deleteEnabled,false,false,"Enable delete functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. A delete will be performed when a record with a null value (that is–a tombstone record) is read. This feature will not work with SMTs that change the name of the topic." -enableBatchLoad,false,“”,Beta Feature Use with caution. The sublist of topics to be batch loaded through GCS. -gcsBucketName,false,"""”",The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if `enableBatchLoad` is configured. -includeKafkaData,false,false,"Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows." -intermediateTableSuffix,false,“.tmp”,"A suffix that will be appended to the names of destination tables to create the names for the corresponding intermediate tables. Multiple intermediate tables may be created for a single destination table, but their names will always start with the name of the destination table, followed by this suffix, and possibly followed by an additional suffix." -kafkaDataFieldName,false,,"The Kafka data field name. The default value is null, which means the Kafka Data field will not be included." -kafkaKeyFieldName,false,,"The Kafka key field name. The default value is null, which means the Kafka Key field will not be included." -keyfile,true,,"Can be either a string representation of the Google credentials file or the path to the Google credentials file itself. - -When using the Astra Streaming UI, the string representation must be used. If using pulsar-admin with Astra Streaming, either the representation or file can be used." -keySource,true,FILE,"Determines whether the keyfile configuration is the path to the credentials JSON file or to the JSON itself. Available values are `FILE` and `JSON`. - -When using the Astra Streaming UI, JSON will be the only option. If using pulsar-admin with Astra Streaming, either the representation or file can be used." -name,true,,The name of the connector. Use the same value as Pulsar sink name. -mergeIntervalMs,false,60_000L,"How often (in milliseconds) to perform a merge flush, if upsert/delete is enabled. Can be set to -1 to disable periodic flushing." -mergeRecordsThreshold,false,-1,"How many records to write to an intermediate table before performing a merge flush, if upsert/delete is enabled. Can be set to -1 to disable record count-based flushing." -project,true,,The BigQuery project to write to -queueSize,false,-1,The maximum size (or -1 for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is triggered or the size of the queue drops under half of the maximum size. -sanitizeTopics,true,false,"Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names. - -The only accepted value is `false`. Providing any other value will result in an error." -schemaRetriever,false,com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever,A class that can be used for automatically creating tables and/or updating schemas. -threadPoolSize,false,10,The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery. -timePartitioningType,false,DAY,"The time partitioning type to use when creating tables. Existing tables will not be altered to use this partitioning type. Valid Values: (case insensitive) [MONTH, YEAR, HOUR, DAY]" -timestampPartitionFieldName,false,,"The name of the field in the value that contains the timestamp to partition by in BigQuery and enable timestamp partitioning for each table. Leave this configuration blank, to enable ingestion time partitioning for each table." -topic2TableMap,false,,"Map of topics to tables (optional). - -Format: comma-separated tuples, e.g. :,:,.. - -Note, because `sanitizeTopicName` must be `true`, that in-turn means any alphanumeric character in the topic name will be replaced as underscore “_”. So when creating a mapping you need to take the underscores into account. - -For example, if the topic name is provided as “persistent://a/b/c-d” then the mapping topic name would be “persistent___a_b_c_d”. - -topics,true,,"A list of Kafka topics to read from. Use the same name as the Pulsar topic (not the whole address, just the topic name)." -upsertEnabled,false,false,"Enable upsert functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. Row-matching will be performed based on the contents of record keys. This feature won’t work with SMTs that change the name of the topic." \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/aws-S3.csv b/modules/pulsar-io/examples/connectors/sinks/cloud-storage/aws-S3.csv deleted file mode 100644 index e97c879..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/aws-S3.csv +++ /dev/null @@ -1,31 +0,0 @@ -Name,Required,Default,Description -accessKeyId,true,null,The Cloud Storage access key ID. It requires permission to write objects. -bucket,true,null,The Cloud Storage bucket. -endpoint,true,null,The Cloud Storage endpoint. -provider,true,null,"The Cloud Storage type, such as aws-s3,s3v2(s3v2 uses the AWS client but not the JCloud client)." -secretAccessKey,true,null,The Cloud Storage secret access key. -avroCodec,false,snappy,"Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy." -avroCodec,false,snappy,"Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy." -batchSize,false,10,The number of records submitted in batch. -batchTimeMs,false,1000,The interval for batch submission. -bytesFormatTypeSeparator,false,0x10,"It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object." -formatType,false,json,"The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON." -jsonAllowNaN,false,false,"Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default." -jsonAllowNaN,false,false,"Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default." -maxBatchBytes,false,10000000,The maximum number of bytes in a batch. -parquetCodec,false,gzip,"Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd." -parquetCodec,false,gzip,"Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd." -partitionerType,false,partition,"The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions." -partitionerUseIndexAsOffset,false,false,"Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details." -pathPrefix,false,false,"If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/." -pendingQueueSize,false,10,"The number of records buffered in queue. By default, it is equal tobatchSize. You can set it manually." -role,false,null,The Cloud Storage role. -roleSessionName,false,null,The Cloud Storage role session name. -skipFailedMessages,false,false,"Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message." -sliceTopicPartitionPath,false,false,"When it is set to true, split the partitioned topic name into separate folders in the bucket path." -timePartitionDuration,false,86400000,"The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d." -timePartitionPattern,false,yyyy-MM-dd,"The format pattern of the time-based partitioning. For details, refer to the Java date and time format." -useHumanReadableMessageId,false,false,"Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string." -useHumanReadableSchemaVersion,false,false,"Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format." -withMetadata,false,false,Save message attributes to metadata. -withTopicPartitionNumber,false,true,"When it is set to true, include the topic partition number to the object path." \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/azure-blob.csv b/modules/pulsar-io/examples/connectors/sinks/cloud-storage/azure-blob.csv deleted file mode 100644 index 3049c93..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/azure-blob.csv +++ /dev/null @@ -1,28 +0,0 @@ -Name,Required,Default,Description -azureStorageAccountConnectionString,true,,The Azure Blob Storage connection string. Required when authenticating via connection string. -azureStorageAccountKey,true,,The Azure Blob Storage account key. Required when authenticating via account name and account key. -azureStorageAccountName,true,,The Azure Blob Storage account name. Required when authenticating via account name and account key. -azureStorageAccountSASToken,true,,The Azure Blob Storage account SAS token. Required when authenticating via SAS token. -bucket,true,null,The Cloud Storage bucket. -endpoint,true,null,The Azure Blob Storage endpoint. -provider,true,null,The Cloud Storage type. Azure Blob Storage only supports the azure-blob-storage provider. -avroCodec,false,snappy,"Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy." -batchSize,false,10,The number of records submitted in batch. -batchTimeMs,false,1000,The interval for batch submission. -bytesFormatTypeSeparator,false,0x10,"It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object." -formatType,false,json,"The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON." -jsonAllowNaN,false,false,"Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default." -maxBatchBytes,false,10000000,The maximum number of bytes in a batch. -parquetCodec,false,gzip,"Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd." -partitionerType,false,partition,"The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions." -partitionerUseIndexAsOffset,false,false,"Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details." -pathPrefix,false,false,"If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/." -pendingQueueSize,false,10,"The number of records buffered in queue. By default, it is equal to batchSize. You can set it manually." -skipFailedMessages,false,false,"Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message." -sliceTopicPartitionPath,false,false,"When it is set to true, split the partitioned topic name into separate folders in the bucket path." -timePartitionDuration,false,86400000,"The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d." -timePartitionPattern,false,yyyy-MM-dd,"The format pattern of the time-based partitioning. For details, refer to the Java date and time format." -useHumanReadableMessageId,false,false,"Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string." -useHumanReadableSchemaVersion,false,false,"Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format." -withMetadata,false,false,Save message attributes to metadata. -withTopicPartitionNumber,false,true,"When it is set to true, include the topic partition number to the object path." \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/data-format.csv b/modules/pulsar-io/examples/connectors/sinks/cloud-storage/data-format.csv deleted file mode 100644 index 2153c95..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/data-format.csv +++ /dev/null @@ -1,6 +0,0 @@ -Pulsar Schema,Writer: Avro,Writer: JSON,Writer: Parquet,Writer: Bytes -Primitive,❌,✅ *,❌,✅ -Avro,✅,✅,✅,✅ -Json,✅,✅,✅,✅ -Protobuf **,✅,✅,✅,✅ -ProtobufNative,✅ * * *,❌,✅,✅ \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/gcp-gcs.csv b/modules/pulsar-io/examples/connectors/sinks/cloud-storage/gcp-gcs.csv deleted file mode 100644 index 8b5a053..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/gcp-gcs.csv +++ /dev/null @@ -1,25 +0,0 @@ -Name,Required,Default,Description -bucket,true,null,The Cloud Storage bucket. -provider,true,null,The Cloud Storage type. Google cloud storage only supports the google-cloud-storage provider. -avroCodec,false,snappy,"Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy." -batchSize,false,10,The number of records submitted in batch. -batchTimeMs,false,1000,The interval for batch submission. -bytesFormatTypeSeparator,false,0x10,"It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object." -formatType,false,json,"The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON." -gcsServiceAccountKeyFileContent,false,,"The contents of the JSON service key file. If empty, credentials are read from gcsServiceAccountKeyFilePath file." -gcsServiceAccountKeyFilePath,false,,"Path to the GCS credentials file. If empty, the credentials file are read from the GOOGLE_APPLICATION_CREDENTIALS environment variable." -jsonAllowNaN,false,false,"Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default." -maxBatchBytes,false,10000000,The maximum number of bytes in a batch. -parquetCodec,false,gzip,"Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd." -partitionerType,false,partition,"The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions." -partitionerUseIndexAsOffset,false,false,"Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details." -pathPrefix,false,false,"If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/." -pendingQueueSize,false,10,"The number of records buffered in queue. By default, it is equal to batchSize. You can set it manually." -skipFailedMessages,false,false,"Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message." -sliceTopicPartitionPath,false,false,"When it is set to true, split the partitioned topic name into separate folders in the bucket path." -timePartitionDuration,false,86400000,"The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d." -timePartitionPattern,false,yyyy-MM-dd,"The format pattern of the time-based partitioning. For details, refer to the Java date and time format." -useHumanReadableMessageId,false,false,"Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string." -useHumanReadableSchemaVersion,false,false,"Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format." -withMetadata,false,false,Save message attributes to metadata. -withTopicPartitionNumber,false,true,"When it is set to true, include the topic partition number to the object path." \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/with-meta-data.csv b/modules/pulsar-io/examples/connectors/sinks/cloud-storage/with-meta-data.csv deleted file mode 100644 index 7165171..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/cloud-storage/with-meta-data.csv +++ /dev/null @@ -1,5 +0,0 @@ -Writer Format,withMetadata -Avro,✅ -JSON,✅ -Parquet,✅ * -Bytes,❌ \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-delete.sh b/modules/pulsar-io/examples/connectors/sinks/curl-delete.sh deleted file mode 100644 index ae11d67..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-delete.sh +++ /dev/null @@ -1,3 +0,0 @@ -# Delete all instances of a connector -curl -sS --fail --location --request DELETE ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-info.sh b/modules/pulsar-io/examples/connectors/sinks/curl-info.sh deleted file mode 100644 index 4b3b4e1..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-info.sh +++ /dev/null @@ -1,2 +0,0 @@ -curl -sS --fail --location ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-restart.sh b/modules/pulsar-io/examples/connectors/sinks/curl-restart.sh deleted file mode 100644 index 62725b7..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-restart.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Restart all instances of a connector -curl -sS --fail --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/restart' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" - -# Restart an individual instance of a connector -curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/restart" \ --H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-start.sh b/modules/pulsar-io/examples/connectors/sinks/curl-start.sh deleted file mode 100644 index 5408f5f..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-start.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Start all instances of a connector -curl -sS --fail --location --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/start' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" - -# Start an individual instance of a connector -curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/start" \ --H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-status.sh b/modules/pulsar-io/examples/connectors/sinks/curl-status.sh deleted file mode 100644 index f83eb24..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-status.sh +++ /dev/null @@ -1,8 +0,0 @@ -# Get the status of all connector instances -curl -sS --fail --location ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/status' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" - -# Get the status of an individual connector instance -curl "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/status" \ - -H "accept: application/json" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/curl-stop.sh b/modules/pulsar-io/examples/connectors/sinks/curl-stop.sh deleted file mode 100644 index 3b4373c..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/curl-stop.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Stop all instances of a connector -curl -sS --fail --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/stop' \ - --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" - -# Stop an individual instance of a connector -curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/stop" \ - --H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-delete.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-delete.sh deleted file mode 100644 index ce02c85..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-delete.sh +++ /dev/null @@ -1,5 +0,0 @@ -# Delete all instances of a connector -./bin/pulsar-admin sinks delete \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-info.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-info.sh deleted file mode 100644 index 28c652e..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-info.sh +++ /dev/null @@ -1,5 +0,0 @@ -# Get information about connector -./bin/pulsar-admin sinks get \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-restart.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-restart.sh deleted file mode 100644 index 50ba5e1..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-restart.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Restart all instances of a connector -./bin/pulsar-admin sinks restart \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only restart an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-start.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-start.sh deleted file mode 100644 index 05a1d43..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-start.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Start all instances of a connector -./bin/pulsar-admin sinks start \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only start an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-status.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-status.sh deleted file mode 100644 index d9f2f50..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-status.sh +++ /dev/null @@ -1,6 +0,0 @@ -# Check connector status -./bin/pulsar-admin sinks status \ - --instance-id "$SINK_INSTANCEID" \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-stop.sh b/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-stop.sh deleted file mode 100644 index c81b079..0000000 --- a/modules/pulsar-io/examples/connectors/sinks/pulsar-admin-stop.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Stop all instances of a connector -./bin/pulsar-admin sinks stop \ - --namespace "$NAMESPACE" \ - --name "$SINK_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only stop an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/curl-delete.sh b/modules/pulsar-io/examples/connectors/sources/curl-delete.sh deleted file mode 100644 index f05cb79..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-delete.sh +++ /dev/null @@ -1,3 +0,0 @@ -# Delete all instances of a connector -curl -sS --fail -X DELETE "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/curl-info.sh b/modules/pulsar-io/examples/connectors/sources/curl-info.sh deleted file mode 100644 index 49ce5a3..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-info.sh +++ /dev/null @@ -1,4 +0,0 @@ -# Get a connector's information -curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \ - -H "accept: application/json" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/curl-restart.sh b/modules/pulsar-io/examples/connectors/sources/curl-restart.sh deleted file mode 100644 index c82083a..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-restart.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Restart all instances of a connector -curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/restart" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" - -# Restart an individual instance of a connector -#curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/restart" \ -# -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/curl-start.sh b/modules/pulsar-io/examples/connectors/sources/curl-start.sh deleted file mode 100644 index dc7f6d9..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-start.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Start all instances of a connector -curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/start" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" - -# Start an individual instance of a connector -curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/start" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/curl-status.sh b/modules/pulsar-io/examples/connectors/sources/curl-status.sh deleted file mode 100644 index 7ff9de4..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-status.sh +++ /dev/null @@ -1,9 +0,0 @@ -# Get the status of all connector instances -curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/status" \ - -H "accept: application/json" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" - -# Get the status of an individual connector instance -curl "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/status" \ - -H "accept: application/json" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" diff --git a/modules/pulsar-io/examples/connectors/sources/curl-stop.sh b/modules/pulsar-io/examples/connectors/sources/curl-stop.sh deleted file mode 100644 index e6084ed..0000000 --- a/modules/pulsar-io/examples/connectors/sources/curl-stop.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Stop all instances of a connector -curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/stop" \ - -H "Authorization: $ASTRA_STREAMING_TOKEN" - -# Stop an individual instance of a connector -#curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/stop" \ -# -H "Authorization: $ASTRA_STREAMING_TOKEN" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/kafka/kafka-source-configuration.adoc b/modules/pulsar-io/examples/connectors/sources/kafka/kafka-source-configuration.adoc deleted file mode 100644 index 8f9f1dd..0000000 --- a/modules/pulsar-io/examples/connectors/sources/kafka/kafka-source-configuration.adoc +++ /dev/null @@ -1,26 +0,0 @@ -[cols="1,1,1,1,3",options=header] -|=== -|*Name* -|*Type* -|*Required* -|*Default* -|*Description* - -| `bootstrapServers` |String| true | " " (empty string) | A comma-separated list of host and port pairs for establishing the initial connection to the Kafka cluster. -| `groupId` |String| true | " " (empty string) | A unique string that identifies the group of consumer processes to which this consumer belongs. -| `fetchMinBytes` | long|false | 1 | The minimum byte expected for each fetch response. -| `autoCommitEnabled` | boolean |false | true | If set to true, the consumer's offset is periodically committed in the background. + -This committed offset is used when the process fails as the position from which a new consumer begins. -| `autoCommitIntervalMs` | long|false | 5000 | The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if `autoCommitEnabled` is set to true. -| `heartbeatIntervalMs` | long| false | 3000 | The interval between heartbeats to the consumer when using Kafka's group management facilities. + -**Note: `heartbeatIntervalMs` must be smaller than `sessionTimeoutMs`**. -| `sessionTimeoutMs` | long|false | 30000 | The timeout used to detect consumer failures when using Kafka's group management facility. -| `topic` | String|true | " " (empty string)| The Kafka topic that sends messages to {pulsar-short}. -| `consumerConfigProperties` | Map| false | " " (empty string) | The consumer configuration properties to be passed to consumers. + -**Note: other properties specified in the connector configuration file take precedence over this configuration**. -| `keyDeserializationClass` | String|false | org.apache.kafka.common.serialization.StringDeserializer | The deserializer class for Kafka consumers to deserialize keys. + -The deserializer is set by a specific implementation of https://github.com/apache/pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaAbstractSource.java[`KafkaAbstractSource`]. -| `valueDeserializationClass` | String|false | org.apache.kafka.common.serialization.ByteArrayDeserializer | The deserializer class for Kafka consumers to deserialize values. -| `autoOffsetReset` | String | false | earliest | The default offset reset policy. - -|=== \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/kinesis/kinesis-source-configuration.adoc b/modules/pulsar-io/examples/connectors/sources/kinesis/kinesis-source-configuration.adoc deleted file mode 100644 index 1006c1e..0000000 --- a/modules/pulsar-io/examples/connectors/sources/kinesis/kinesis-source-configuration.adoc +++ /dev/null @@ -1,37 +0,0 @@ -[cols="2,1,1,1,3",options=header] -|=== -|*Name* -|*Type* -|*Required* -|*Default* -|*Description* - -|`initialPositionInStream`|InitialPositionInStream|false|LATEST|The position where the connector starts from. Below are the available options: + -* `AT_TIMESTAMP`: start from the record at or after the specified timestamp. + -* `LATEST`: start after the most recent data record. + -* `TRIM_HORIZON`: start from the oldest available data record. -|`startAtTime`|Date|false|" " (empty string)|If set to `AT_TIMESTAMP`, it specifies the point in time to start consumption. -|`applicationName`|String|false|{pulsar-short} IO connector|The name of the Amazon Kinesis application. + -By default, the application name is included in the user agent string used to make AWS requests. This can assist with troubleshooting, for example, distinguish requests made by separate connector instances. -|`checkpointInterval`|long|false|60000|The frequency of the Kinesis stream checkpoint in milliseconds. -|`backoffTime`|long|false|3000|The amount of time to delay between requests when the connector encounters a throttling exception from AWS Kinesis in milliseconds. -|`numRetries`|int|false|3|The number of re-attempts when the connector encounters an exception while trying to set a checkpoint. -|`receiveQueueSize`|int|false|1000|The maximum number of AWS records that can be buffered inside the connector. + -Once the `receiveQueueSize` is reached, the connector does not consume any messages from Kinesis until some messages in the queue are successfully consumed. -|`dynamoEndpoint`|String|false|" " (empty string)|The Dynamo end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. -|`cloudwatchEndpoint`|String|false|" " (empty string)|The Cloudwatch end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. -|`useEnhancedFanOut`|boolean|false|true|If set to true, it uses Kinesis enhanced fan-out. +If set to false, it uses polling. -|`awsEndpoint`|String|false|" " (empty string)|The Kinesis end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. -|`awsRegion`|String|false|" " (empty string)|The AWS region. **Example** `us-west-1`, `us-west-2` -|`awsKinesisStreamName`|String|true|" " (empty string)|The Kinesis stream name. -|`awsCredentialPluginParam`|String |false|" " (empty string)|The JSON parameter to initialize `awsCredentialsProviderPlugin`. -|`awsCredentialPluginName`|String|false|" " (empty string)|The fully-qualified class name of implementation of {@inject: github:AwsCredentialProviderPlugin:/pulsar-io/aws/src/main/java/org/apache/pulsar/io/aws/AwsCredentialProviderPlugin.java} + -`awsCredentialProviderPlugin` has the following built-in plugs: + -`org.apache.pulsar.io.kinesis.AwsDefaultProviderChainPlugin`: this plugin uses the default AWS provider chain. For more information, see https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default[using the default credential provider chain]. + -`org.apache.pulsar.io.kinesis.STSAssumeRoleProviderPlugin`: this plugin takes a configuration via the `awsCredentialPluginParam` that describes a role to assume when running the KCL. + -**JSON configuration example** + -`{"roleArn": "arn...", "roleSessionName": "name"}` + -`awsCredentialPluginName` is a factory class which creates an AWSCredentialsProvider that is used by Kinesis sink. + -If `awsCredentialPluginName` set to empty, the Kinesis sink creates a default AWSCredentialsProvider which accepts json-map of credentials in `awsCredentialPluginParam`. - -|=== \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-delete.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-delete.sh deleted file mode 100644 index 97694e8..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-delete.sh +++ /dev/null @@ -1,5 +0,0 @@ -# Delete all instances of a connector -./bin/pulsar-admin sources delete \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-info.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-info.sh deleted file mode 100644 index a219d0b..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-info.sh +++ /dev/null @@ -1,5 +0,0 @@ -# Get information about connector -./bin/pulsar-admin sources get \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-restart.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-restart.sh deleted file mode 100644 index 379325f..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-restart.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Restart all instances of a connector -./bin/pulsar-admin sources restart \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only restart an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-start.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-start.sh deleted file mode 100644 index 4540ef1..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-start.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Start all instances of a connector -./bin/pulsar-admin sources start \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only start an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-status.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-status.sh deleted file mode 100644 index f16a7d2..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-status.sh +++ /dev/null @@ -1,6 +0,0 @@ -# Stop all instances of a connector -./bin/pulsar-admin sources status \ - --instance-id "$SOURCE_INSTANCEID" \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" \ No newline at end of file diff --git a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-stop.sh b/modules/pulsar-io/examples/connectors/sources/pulsar-admin-stop.sh deleted file mode 100644 index 72fc90a..0000000 --- a/modules/pulsar-io/examples/connectors/sources/pulsar-admin-stop.sh +++ /dev/null @@ -1,7 +0,0 @@ -# Stop all instances of a connector -./bin/pulsar-admin sources stop \ - --namespace "$NAMESPACE" \ - --name "$SOURCE_NAME" \ - --tenant "$TENANT" - -# optionally add --instance-id to only stop an individual instance \ No newline at end of file diff --git a/modules/pulsar-io/pages/connectors/index.adoc b/modules/pulsar-io/pages/connectors/index.adoc index b4f8156..3b08dfc 100644 --- a/modules/pulsar-io/pages/connectors/index.adoc +++ b/modules/pulsar-io/pages/connectors/index.adoc @@ -9,20 +9,149 @@ Connect popular data sources to {pulsar} topics or sink data from {pulsar-short} Below is a list of {pulsar} source and sink connectors supported by {product}. -[NOTE] +[IMPORTANT] ==== -{product} does not support custom sink or source connectors. +{product} doesn't support custom sink or source connectors. ==== +[#sink-connectors] == Sink Connectors -[#sink-connectors] -include::partial$connectors/sinks/sink-connectors.adoc[tag=production] +[#astradb-sink] +=== AstraDB sink -== Source Connectors +The AstraDB sink connector reads messages from {pulsar} topics and writes them to AstraDB systems. + +xref:connectors/sinks/astra-db.adoc[AstraDB sink documentation]. + +[#cloudstorage-sink] +=== Cloud Storage sink + +The Cloud Storage sink connector reads messages from {pulsar} topics and writes them to Cloud Storage systems. + +xref:connectors/sinks/cloud-storage.adoc[Cloud Storage sink documentation]. + +[#elasticsearch-sink] +=== ElasticSearch sink + +The Elasticsearch sink connector reads messages from {pulsar} topics and writes them to Elasticsearch systems. + +xref:connectors/sinks/elastic-search.adoc[Elasticsearch sink documentation]. + +[#bigquery-sink] +=== Google BigQuery sink + +The Google BigQuery sink connector reads messages from {pulsar} topics and writes them to BigQuery systems. + +xref:connectors/sinks/google-bigquery.adoc[Google BigQuery sink documentation]. + +[#jdbc-clickhouse-sink] +=== JDBC-Clickhouse sink + +The JDBC-ClickHouse sink connector reads messages from {pulsar} topics and writes them to JDBC-ClickHouse systems. + +xref:connectors/sinks/jdbc-clickhouse.adoc[JDBC ClickHouse sink documentation]. + +[#jdbc-mariadb-sink] +=== JDBC-MariaDB sink + +The JDBC-MariaDB sink connector reads messages from {pulsar} topics and writes them to JDBC-MariaDB systems. + +xref:connectors/sinks/jdbc-mariadb.adoc[JDBC MariaDB sink documentation]. + +[#jdbc-postgres-sink] +=== JDBC-PostgreSQL sink + +The JDBC-PostgreSQL sink connector reads messages from {pulsar} topics and writes them to JDBC-PostgreSQL systems. + +xref:connectors/sinks/jdbc-postgres.adoc[JDBC PostgreSQL sink documentation]. + +[#jdbc-sqlite-sink] +=== JDBC-SQLite + +The JDBC-SQLite sink connector reads messages from {pulsar} topics and writes them to JDBC-SQLite systems. + +xref:connectors/sinks/jdbc-sqllite.adoc[JDBC SQLite sink documentation]. + +[#kafka-sink] +=== Kafka + +The Kafka sink connector reads messages from {pulsar} topics and writes them to Kafka systems. + +xref:connectors/sinks/kafka.adoc[Kafka sink documentation]. + +[#kinesis-sink] +=== Kinesis + +The Kinesis sink connector reads messages from {pulsar} topics and writes them to Kinesis systems. + +xref:connectors/sinks/kinesis.adoc[Kinesis sink documentation]. + +[#snowflake-sink] +=== Snowflake + +The Snowflake sink connector reads messages from {pulsar} topics and writes them to Snowflake systems. + +xref:connectors/sinks/snowflake.adoc[Snowflake sink documentation]. [#source-connectors] -include::partial$connectors/sources/source-connectors.adoc[tag=production] +== Source Connectors + +[#datagenerator-source] +=== Data Generator source + +The Data generator source connector produces messages for testing and persists the messages to {pulsar-short} topics. + +xref:connectors/sources/data-generator.adoc[Data Generator source documentation] + +[#debezium-mongodb-source] +=== Debezium MongoDB source + +The Debezium MongoDB source connector reads data from Debezium MongoDB systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/debezium-mongodb.adoc[Debezium MongoDB source documentation] + +[#debezium-mysql-source] +=== Debezium MySQL source + +The Debezium MySQL source connector reads data from Debezium MySQL systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/debezium-mysql.adoc[Debezium MySQL source documentation] + +[#debezium-oracle-source] +=== Debezium Oracle source + +The Debezium Oracle source connector reads data from Debezium Oracle systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/debezium-oracle.adoc[Debezium Oracle source documentation] + +[#debezium-postgres-source] +=== Debezium Postgres source + +The Debezium PostgreSQL source connector reads data from Debezium PostgreSQL systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/debezium-postgres.adoc[Debezium PostgreSQL source documentation] + +[#debezium-sql-server-source] +=== Debezium SQL Server source + +The Debezium SQL Server source connector reads data from Debezium SQL Server systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/debezium-sqlserver.adoc[Debezium SQL Server source documentation] + +[#kafka-source] +=== Kafka source + +The Kafka source connector reads data from Kafka systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/kafka.adoc[Kafka source connector documentation] + +[#kinesis-source] +=== AWS Kinesis source + +The AWS Kinesis source connector reads data from Kinesis systems and produces data to {pulsar-short} topics. + +xref:connectors/sources/kinesis.adoc[Kinesis source connector documentation] == Experimental Connectors @@ -30,15 +159,83 @@ include::partial$connectors/sources/source-connectors.adoc[tag=production] To get access to these connectors, contact {support-url}[{company} Support]. +[#sink-experimental] === Sink Connectors (experimental) -[#sink-experimental] -include::partial$connectors/sinks/sink-connectors.adoc[tag=sink-experimental] +Kinetica + +Aerospike + +Azure DocumentDB + +Azure Data Explorer (Kusto) + +Batch Data Generator + +CoAP + +Couchbase + +DataDog + +Diffusion + +Flume + +Apache Geode + +Hazelcast + +Apache HBase + +HDFS 2 + +HDFS 3 + +Humio + +InfluxDB + +JMS + +Apache Kudu + +MarkLogic + +MongoDB + +MQTT + +Neo4J + +New Relic + +OrientDB + +Apache Phoenix + +PLC4X + +RabbitMQ + +Redis + +SAP HANA + +SingleStore + +Apache Solr + +Splunk + +XTDB + +Zeebe + +[#source-experimental] === Source Connectors (experimental) -[#source-experimental] -include::partial$connectors/sources/source-connectors.adoc[tag=source-experimental] +{cass-short} Source + +Kinetica + +Azure DocumentDB + +Batch Data Generator + +Big Query + +canal + +CoAP + +Couchbase + +datadog + +diffusion + +DynamoDB + +file + +flume + +Apache Geode + +Hazelcast + +Humio + +JMS + +Apache Kudu + +MarkLogic + +MongoDB + +MQTT + +Neo4J + +New Relic + +NSQ + +OrientDB + +Apache Phoenix + +PLC4X + +RabbitMQ + +Redis + +SAP HANA + +SingleStore + +Splunk + +Twitter + +XTDB + +Zeebe + == Listing Sink Connectors @@ -49,7 +246,7 @@ To list available sink connectors in your {product} tenant, use any of the follo {pulsar-short} Admin:: + -- -Assuming you have downloaded client.conf to the {pulsar-short} folder: +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -60,7 +257,6 @@ Assuming you have downloaded client.conf to the {pulsar-short} folder: curl:: + -- -// tag::rest-env-vars[] You need a {pulsar-short} token for REST API authentication. This is different from your {astra-db} application tokens. @@ -81,7 +277,6 @@ This is different from your {astra-db} application tokens. export WEB_SERVICE_URL= export ASTRA_STREAMING_TOKEN= ---- -// end::rest-env-vars[] . Use these values to form curl commands to the REST API: + @@ -101,7 +296,7 @@ To list available source connectors in your {product} tenant, use any of the fol {pulsar-short} Admin:: + -- -Assuming you have downloaded client.conf to the {pulsar-short} folder: +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -109,11 +304,32 @@ Assuming you have downloaded client.conf to the {pulsar-short} folder: ---- -- -cURL:: +curl:: + -- -include::pulsar-io:connectors/index.adoc[tag=rest-env-vars] +You need a {pulsar-short} token for REST API authentication. +This is different from your {astra-db} application tokens. + +. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. + +. Click your tenant's name, and then click the *Settings* tab. +. Click *Create Token*. + +. Copy the token, store it securely, and then click *Close*. + +. Click the *Connect* tab, and then copy the *Web Service URL*. + +. Create environment variables for your tenant's token and web service URL: ++ +[source,shell,subs="attributes+"] +---- +export WEB_SERVICE_URL= +export ASTRA_STREAMING_TOKEN= +---- + +. Use the token to authenticate requests: ++ [source,shell,subs="attributes+"] ---- curl "$WEB_SERVICE_URL/admin/v3/sources/builtinsources" -H "Authorization: $ASTRA_STREAMING_TOKEN" @@ -121,6 +337,6 @@ curl "$WEB_SERVICE_URL/admin/v3/sources/builtinsources" -H "Authorization: $ASTR -- ==== -== What's next? +== See also For more on {pulsar-short} IO connectors, see the https://pulsar.apache.org/docs/en/io-overview/[{pulsar-short} documentation]. \ No newline at end of file diff --git a/modules/pulsar-io/pages/connectors/sinks/astra-db.adoc b/modules/pulsar-io/pages/connectors/sinks/astra-db.adoc index 3448f1f..92b4885 100644 --- a/modules/pulsar-io/pages/connectors/sinks/astra-db.adoc +++ b/modules/pulsar-io/pages/connectors/sinks/astra-db.adoc @@ -5,7 +5,7 @@ {company} {astra-db} Sink Connector is based on the open-source https://docs.datastax.com/en/pulsar-connector/docs/index.html[{cass-reg} sink connector for {pulsar-reg}]. Depending on how you deploy the connector, it can be used to sink topic messages with a table in {astra-db} or a table in a {cass-short} cluster outside of DB. -The {product} portal provides simple way to connect this sink and a table in {astra-db} with simply a token. Using pulsar-admin or the REST API, you can configure the sink to connect with a {cass-short} connection manually. +The {product} portal provides simple way to connect this sink and a table in {astra-db} with simply a token. Using `pulsar-admin` or the REST API, you can configure the sink to connect with a {cass-short} connection manually. This reference assumes you are manually connecting to a {cass-short} table. @@ -39,69 +39,169 @@ include::example$connectors/sinks/astra.csv[] === {cass-short} Connection -These values are provided in the *Configs* area. +These values are provided in the `Configs` area: -The "cloud.secureConnectBundle" can either be a path to your bundle zip or you can base64 encode the zip and provide it in the format: "base64:". - -[%header,format=csv,cols="2,1,1,3"] +[cols=4] |=== -include::example$connectors/sinks/{connectorType}/config.csv[] +| Name | Required | Default | Description + +| auth +| yes +| `{}` +| Refer to the auth properties reference + +| cloud.secureConnectBundle +| yes +| +|Can either be a path to your database's Secure Connect Bundle (SCB) zip or a base64 encoding of the zip provided it in the format `base64:`. + +| compression +| yes +| None +| + +| connectionPoolLocalSize +| yes +| 4 +| + +| ignoreErrors +| yes +| None +| + +| jmx +| yes +| true +| + +| maxConcurrentRequests +| yes +| 500 +| + +| maxNumberOfRecordsInBatch +| yes +| 32 +| + +| queryExecutionTimeout +| yes +| 30 +| + +| task.max +| yes +| 1 +| + +| tasks.max +| yes +| 1 +| + +| topic +| yes +| `{}` +| Refer to the topic properties reference + +| topics +| yes +| +|The topic name to watch |=== -// TODO: Need descriptions of every param === Auth Properties -These values are provided in the *auth* area in the above {cass-short} connection parameters. +These values are provided in the `auth` area of the preceding {cass-short} connection parameters: -[%header,format=csv,cols="2,1,1,3"] +[cols=3] |=== -include::example$connectors/sinks/{connectorType}/auth.csv[] +| Name | Required | Default + +| gssapi +| yes +| `{ "service": "dse" }` + +| password +| yes +| + +| provider +| yes +| None + +| username +| yes +| `token` |=== === Topic Properties -These values are provided in the *topic* area in the above {cass-short} connection parameters. +These values are provided in the `topic` area of the preceding {cass-short} connection parameters. Refer to the official documentation for a https://docs.datastax.com/en/pulsar-connector/docs/cfgRefPulsarDseConnection.html[connection properties reference]. === Mapping topic data to table columns -[TIP] -==== -There are quite a few examples in the "https://docs.datastax.com/en/pulsar-connector/docs/cfgPulsarMapTopicTable.html[Mapping pulsar topics to database tables]" area of the official documentation -==== +An essential part of using this sink connector is mapping message values to table columns. +There are many factors that influence how this done and what is possible. -An essential part of using this sink connector is mapping message values to table columns. There are many factors that influence how this done and what is possible. The 'mapping' string is a simple comma-separated list of column names and message value fields. +While the preceding examples showed how to configure the connector in one large command, it is easier to manage this as a separate file. +The following steps explain how to configure the connector using a JSON configuration file with the minimum required values. -While the getting started examples above show how to configure the connector in one large command, it is easier to manage this as a separate file. The following example show how to configure the connector using a configuration in json format. The "https://docs.datastax.com/en/pulsar-connector/docs/pulsarQuickStart.html[{pulsar-short} Connector single instance quick start]" guide provides a good example of this. Below are the minimum requirements. +For a more detailed example of this pattern, see the xref:pulsar-connector:ROOT:pulsarQuickStart.adoc[{pulsar-short} Connector single instance quickstart]. -Create a file named configs.json using the following structure: - -[source] +. Create a file named `configs.json` with the following content: ++ +[source,json,subs="+quotes"] ---- "archive": "builtin://cassandra-enhanced", -"tenant": "", -"namespace": "", -"name": "", -"inputs": [""], -"configs:": { - "topics": , - "cloud.secureConnectBundle": , +"tenant": "**TENANT_NAME**", +"namespace": "**NAMESPACE_NAME**", +"name": "**CONNECTOR_NAME**", +"inputs": ["**TOPIC_NAME**"], +"configs": { + "topics": "**TOPIC_NAME**", + "cloud.secureConnectBundle": "**SCB**", "topic": { - "": { - "": { - "": { - "", - ... - "mapping": + "**TOPIC_NAME**": { + "**KEYSPACE_NAME**": { + "**TABLE_NAME**": { + **CONNECTION_PROPERTIES**, + "mapping": "**MAPPING_STRING**" } } } } } ---- ++ +Replace the following: + +* **TENANT_NAME**: Your tenant's name. +* **NAMESPACE_NAME**: The tenant's associated namespace name. +* **CONNECTOR_NAME**: The name of the connector. +* **TOPIC_NAME**: In `inputs` and `configs.topics`, specify the names of the topics to connect. +Specify topic names only; don't use the full topic addresses. +You can specify multiple topics. +Define one `configs.topic` object for each topic that you want to connect. +* **SCB**: The path to your database's Secure Connect Bundle (SCB) zip or a base64 encoding of the SCB zip. +* **KEYSPACE_NAME**: The name of the keyspace in your database that contains the table you want to connect to a topic. +* **TABLE_NAME**: The name of the table to connect to a topic. +* **TABLE_CONNECTION_PROPERTIES**: Additional topic-to-table connection properties, if required. +For more information, see xref:pulsar-connector:ROOT:cfgRefPulsarDseTable.adoc[]. +* **MAPPING_STRING**: The mapping string for the table columns as a comma-separated list of column names and message value fields. +For example: ++ +[source,text] +---- +symbol=value.symbol, ts=value.ts, exchange=value.exchange, industry=value.industry, name=key, value=value.value +---- ++ +For more mapping examples, see xref:pulsar-connector:ROOT:cfgPulsarMapTopicTable.adoc[Mapping pulsar topics to database tables]. -Use the pulsar-admin cli to create the connector +. Use the `pulsar-admin` CLI to create the connector with your JSON file: [source,shell] ---- @@ -110,6 +210,4 @@ Use the pulsar-admin cli to create the connector --classname com.datastax.oss.sink.pulsar.StringCassandraSinkTask \ --sink-config-file configs.json \ --sink-type cassandra-enhanced ----- - -To create the value for the mapping parameter you would provide direction how each column value will be filled. An example string is: `symbol=value.symbol, ts=value.ts, exchange=value.exchange, industry=value.industry, name=key, value=value.value`. \ No newline at end of file +---- \ No newline at end of file diff --git a/modules/pulsar-io/pages/connectors/sinks/cloud-storage.adoc b/modules/pulsar-io/pages/connectors/sinks/cloud-storage.adoc index d6eea1e..b8b74f6 100644 --- a/modules/pulsar-io/pages/connectors/sinks/cloud-storage.adoc +++ b/modules/pulsar-io/pages/connectors/sinks/cloud-storage.adoc @@ -7,11 +7,9 @@ Each public cloud has different ways of persisting data to their storage systems The cloud storage system supported are: -- Google Cloud Storage (GCP) -- S3 (AWS) -- Azure Blob (Azure) - -(see below for supported data formats) +* https://cloud.google.com/storage[Google's Cloud Storage (GCP)] +* https://azure.microsoft.com/en-us/products/storage/blobs[Azure Blob Store (Azure)] +* https://aws.amazon.com/s3/[Amazon Web Services S3 (AWS)] == Get Started @@ -19,34 +17,82 @@ include::partial$connectors/sinks/get-started.adoc[] == Data format types -The Cloud Storage sink connector provides multiple output format options, including JSON, Avro, Bytes, or Parquet. The default format is JSON. With current implementation, there are some limitations for different formats: +The Cloud Storage sink connector provides multiple output format options, including JSON (default), Avro, Bytes, or Parquet. +There are some limitations for certain formats, as explained in the following sections. -{pulsar-short} Schema types supported by the writers: +=== {pulsar-short} Schema types supported by the writers -[%header,format=csv,cols="1,^1,^1,^1,^1"] -|=== -include::example$connectors/sinks/cloud-storage/data-format.csv[] +[cols=5] |=== +|Pulsar Schema |Writer: Avro |Writer: JSON |Writer: Parquet |Writer: Bytes + +|Primitive +a|❌ +a|✅ + +The JSON writer will try to convert data with a String or Bytes schema to JSON-format data if convertible. + +a|❌ +a|✅ + +|Avro +a|✅ +a|✅ +a|✅ +a|✅ + +|Json +a|✅ +a|✅ +a|✅ +a|✅ -____ -*The JSON writer will try to convert data with a String or Bytes schema to JSON-format data if convertable. +|Protobuf -**The Protobuf schema is based on the Avro schema. It uses Avro as an intermediate format, so it may not provide the best effort conversion. +The Protobuf schema is based on the Avro schema. +It uses Avro as an intermediate format, so it may not provide the best effort conversion. -\*** The ProtobufNative record holds the Protobuf descriptor and the message. When writing to Avro format, the connector uses avro-protobuf to do the conversion. -____ +a|✅ +a|✅ +a|✅ +a|✅ -Supported `withMetadata` configurations for different writer formats: +|ProtobufNative +a|✅ -[%header,format=csv,cols="1,^1",width="50%"] +The ProtobufNative record holds the Protobuf descriptor and the message. +When writing to Avro format, the connector uses `avro-protobuf` to do the conversion. + +a|❌ +a|✅ +a|✅ +|=== + +=== Supported withMetadata configurations for writer formats + +[cols=2] |=== -include::example$connectors/sinks/cloud-storage/with-meta-data.csv[] +|Writer Format |withMetadata + +|Avro +a|✅ + +|JSON +a|✅ + +|Parquet +a|✅ + +|Bytes +a|❌ |=== -____ -*When using Parquet with PROTOBUF_NATIVE format, the connector will write the messages with the DynamicMessage format. When withMetadata is set to true, the connector will add __message_metadata__ to the messages with PulsarIOCSCProtobufMessageMetadata format. +==== Parquet with PROTOBUF_NATIVE format -For example, if a message User has the following schema: +When using Parquet with `PROTOBUF_NATIVE` format, the connector writes the messages with the `DynamicMessage` format. +When `withMetadata` is true, the connector adds __message_metadata__ to the messages with `PulsarIOCSCProtobufMessageMetadata` format. + +For example, if a message `User` has the following schema: [source,protobuf] ---- @@ -57,7 +103,7 @@ message User { } ---- -When withMetadata is set to true, the connector will write the message DynamicMessage with the following schema: +When `withMetadata` is set to true, the connector writes the message `DynamicMessage` with the following schema: [source,protobuf] ---- @@ -73,16 +119,18 @@ message User { PulsarIOCSCProtobufMessageMetadata __message_metadata__ = 3; } ---- -____ -[NOTE] -==== -By default, when the connector receives a message with a non-supported schema type, the connector will fail the message. If you want to skip the non-supported messages, you can set skipFailedMessages to true. -==== +==== Skip unsupported messages + +By default, when the connector receives a message with a non-supported schema type, the connector will fail the message. +If you want to skip the non-supported messages, you can set `skipFailedMessages` to true. == Dead-letter topics -To use a dead-letter topic, set `skipFailedMessages` to `false` in the cloud provider config. Then using either pulsar-admin or curl, set `--max-redeliver-count` and `--dead-letter-topic`. For more info about dead-letter topics, see the https://pulsar.apache.org/docs/en/concepts-messaging/#dead-letter-topic[{pulsar-short} documentation]. If a message fails to be sent to the Cloud Storage sink and there is a dead-letter topic, the connector will send the message to the assigned topic. +If a message fails to send to a Cloud Storage sink, the connector can send the message to a https://pulsar.apache.org/docs/en/concepts-messaging/#dead-letter-topic[dead-letter topic] instead, if a dead-letter topic is assigned. + +To use a dead-letter topic, set `skipFailedMessages` to `false` in the cloud provider config. +Then, using either `pulsar-admin` or curl, set `--max-redeliver-count` and `--dead-letter-topic`. == Managing the Connector @@ -94,63 +142,444 @@ include::partial$connectors/sinks/monitoring.adoc[] == Connector Reference -With the Cloud Storage Sink there are two sets of parameters. First, the {product} parameters, then the parameters specific to your chosen cloud store. +With the Cloud Storage Sink there are two sets of parameters: {product} parameters and cloud storage provider parameters. -=== {product} +=== {product} parameters for Cloud Storage Sink [%header,format=csv,cols="2,1,1,3"] |=== include::example$connectors/sinks/astra.csv[] |=== -=== Cloud specific parameters (configs) +=== Cloud storage provider parameters for Cloud Storage Sink -Choose the storage provider and set the parameter values in the "configs" area. +Set your cloud storage provider and other required values in the `configs` area. [tabs] -==== +====== Google Cloud Storage:: + -- -[%header,format=csv,cols="2,1,1,3"] +[cols="2,1,1,3"] |=== -include::example$connectors/sinks/cloud-storage/gcp-gcs.csv[] +|Name |Required |Default |Description + +|bucket +|yes +|null +|The Cloud Storage bucket + +|provider +|yes +|null +|The Cloud Storage type. Google cloud storage only supports the google-cloud-storage provider. + +|avroCodec +|no +|snappy +|Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy. + +|batchSize +|no +|10 +|The number of records submitted in batch. + +|batchTimeMs +|no +|1000 +|The interval for batch submission. + +|bytesFormatTypeSeparator +|no +|0x10 +|It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object. + +|formatType +|no +|json +|The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON. + +|gcsServiceAccountKeyFileContent +|no +|empty +|The contents of the JSON service key file. If empty, credentials are read from gcsServiceAccountKeyFilePath file. + +|gcsServiceAccountKeyFilePath +|no +|empty +|Path to the GCS credentials file. If empty, the credentials file are read from the GOOGLE_APPLICATION_CREDENTIALS environment variable. + +|jsonAllowNaN +|no +|false +|Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default. + +|maxBatchBytes +|no +|10000000 +|The maximum number of bytes in a batch. + +|parquetCodec +|no +|gzip +|Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd. + +|partitionerType +|no +|partition +|The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions. + +|partitionerUseIndexAsOffset +|no +|false +|Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details. + +|pathPrefix +|no +|false +|If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/. + +|pendingQueueSize +|no +|10 +|The number of records buffered in queue. By default, it is equal to batchSize. You can set it manually. + +|skipFailedMessages +|no +|false +|Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message. + +|sliceTopicPartitionPath +|no +|false +|When it is set to true, split the partitioned topic name into separate folders in the bucket path. + +|timePartitionDuration +|no +|86400000 +|The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d. + +|timePartitionPattern +|no +|yyyy-MM-dd +|The format pattern of the time-based partitioning. For details, refer to the Java date and time format. + +|useHumanReadableMessageId +|no +|false +|Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string. + +|useHumanReadableSchemaVersion +|no +|false +|Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format. + +|withMetadata +|no +|false +|Save message attributes to metadata. + +|withTopicPartitionNumber +|no +|true +|When it is set to true, include the topic partition number to the object path. |=== -- AWS S3 Storage:: + -- - The suggested permission policies for AWS S3 are: -- s3:AbortMultipartUpload -- s3:GetObject* -- s3:PutObject* -- s3:List* +- `s3:AbortMultipartUpload` +- `s3:GetObject*` +- `s3:PutObject*` +- `s3:List*` -If you do not want to provide a region in the configuration, you should enable s3:GetBucketLocation permission policy as well. +If you don't want to provide a region in the configuration, then enable the `s3:GetBucketLocation` permission policy as well. -[%header,format=csv,cols="2,1,1,3"] +[cols="2,1,1,3"] |=== -include::example$connectors/sinks/cloud-storage/aws-S3.csv[] +|Name |Required |Default |Description + +|accessKeyId +|yes +|null +|The Cloud Storage access key ID. It requires permission to write objects. + +|bucket +|yes +|null +|The Cloud Storage bucket. + +|endpoint +|yes +|null +|The Cloud Storage endpoint. + +|provider +|yes +|null +|The Cloud Storage type, such as aws-s3, s3v2 (s3v2 uses the AWS client but not the JCloud client). + +|secretAccessKey +|yes +|null +|The Cloud Storage secret access key. + +|avroCodec +|no +|snappy +|Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy. + +|batchSize +|no +|10 +|The number of records submitted in batch. + +|batchTimeMs +|no +|1000 +|The interval for batch submission. + +|bytesFormatTypeSeparator +|no +|0x10 +|It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object. + +|formatType +|no +|json +|The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON. + +|jsonAllowNaN +|no +|false +|Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default. + +|maxBatchBytes +|no +|10000000 +|The maximum number of bytes in a batch. + +|parquetCodec +|no +|gzip +|Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd. + +|partitionerType +|no +|partition +|The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions. + +|partitionerUseIndexAsOffset +|no +|false +|Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details. + +|pathPrefix +|no +|false +|If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/. + +|pendingQueueSize +|no +|10 +|The number of records buffered in queue. By default, it is equal to batchSize. You can set it manually. + +|role +|no +|null +|The Cloud Storage role. + +|roleSessionName +|no +|null +|The Cloud Storage role session name. + +|skipFailedMessages +|no +|false +|Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message. + +|sliceTopicPartitionPath +|no +|false +|When it is set to true, split the partitioned topic name into separate folders in the bucket path. + +|timePartitionDuration +|no +|86400000 +|The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d. + +|timePartitionPattern +|no +|yyyy-MM-dd +|The format pattern of the time-based partitioning. For details, refer to the Java date and time format. + +|useHumanReadableMessageId +|no +|false +|Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string. + +|useHumanReadableSchemaVersion +|no +|false +|Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format. + +|withMetadata +|no +|false +|Save message attributes to metadata. + +|withTopicPartitionNumber +|no +|true +|When it is set to true, include the topic partition number to the object path. |=== -- Azure Blob Storage:: + -- -[%header,format=csv,cols="2,1,1,3"] +[cols="2,1,1,3"] |=== -include::example$connectors/sinks/cloud-storage/azure-blob.csv[] +|Name |Required |Default |Description + +|azureStorageAccountConnectionString +|yes +| +|The Azure Blob Storage connection string. Required when authenticating via connection string. + +|azureStorageAccountKey +|yes +| +|The Azure Blob Storage account key. Required when authenticating via account name and account key. + +|azureStorageAccountName +|yes +| +|The Azure Blob Storage account name. Required when authenticating via account name and account key. + +|azureStorageAccountSASToken +|yes +| +|The Azure Blob Storage account SAS token. Required when authenticating via SAS token. + +|bucket +|yes +|null +|The Cloud Storage bucket. + +|endpoint +|yes +|null +|The Azure Blob Storage endpoint. + +|provider +|yes +|null +|The Cloud Storage type. Azure Blob Storage only supports the azure-blob-storage provider. + +|avroCodec +|no +|snappy +|Compression codec used when formatType=avro. Available compression types are: null (no compression), deflate, bzip2, xz, zstandard, snappy. + +|batchSize +|no +|10 +|The number of records submitted in batch. + +|batchTimeMs +|no +|1000 +|The interval for batch submission. + +|bytesFormatTypeSeparator +|no +|0x10 +|It is inserted between records for the formatType of bytes. By default, it is set to '0x10'. An input record that contains the line separator looks like multiple records in the output object. + +|formatType +|no +|json +|The data format type. Available options are JSON, Avro, Bytes, or Parquet. By default, it is set to JSON. + +|jsonAllowNaN +|no +|false +|Recognize 'NaN', 'INF', '-INF' as legal floating number values when formatType=json. Since JSON specification does not allow such values this is a non-standard feature and disabled by default. + +|maxBatchBytes +|no +|10000000 +|The maximum number of bytes in a batch. + +|parquetCodec +|no +|gzip +|Compression codec used when formatType=parquet. Available compression types are: null (no compression), snappy, gzip, lzo, brotli, lz4, zstd. + +|partitionerType +|no +|partition +|The partitioning type. It can be configured by topic partitions or by time. By default, the partition type is configured by topic partitions. + +|partitionerUseIndexAsOffset +|no +|false +|Whether to use the Pulsar's message index as offset or the record sequence. It's recommended if the incoming messages may be batched. The brokers may or not expose the index metadata and, if it's not present on the record, the sequence will be used. See PIP-70 for more details. + +|pathPrefix +|no +|false +|If it is set, the output files are stored in a folder under the given bucket path. The pathPrefix must be in the format of xx/xxx/. + +|pendingQueueSize +|no +|10 +|The number of records buffered in queue. By default, it is equal to batchSize. You can set it manually. + +|skipFailedMessages +|no +|false +|Configure whether to skip a message which it fails to be processed. If it is set to true, the connector will skip the failed messages by ack it. Otherwise, the connector will fail the message. + +|sliceTopicPartitionPath +|no +|false +|When it is set to true, split the partitioned topic name into separate folders in the bucket path. + +|timePartitionDuration +|no +|86400000 +|The time interval for time-based partitioning. Support formatted interval string, such as 30d, 24h, 30m, 10s, and also support number in milliseconds precision, such as 86400000 refers to 24h or 1d. + +|timePartitionPattern +|no +|yyyy-MM-dd +|The format pattern of the time-based partitioning. For details, refer to the Java date and time format. + +|useHumanReadableMessageId +|no +|false +|Use a human-readable format string for messageId in message metadata. The messageId is in a format like ledgerId:entryId:partitionIndex:batchIndex. Otherwise, the messageId is a Hex-encoded string. + +|useHumanReadableSchemaVersion +|no +|false +|Use a human-readable format string for the schema version in the message metadata. If it is set to true, the schema version is in plain string format. Otherwise, the schema version is in hex-encoded string format. + +|withMetadata +|no +|false +|Save message attributes to metadata. + +|withTopicPartitionNumber +|no +|true +|When it is set to true, include the topic partition number to the object path. |=== -- -==== - -== What's next? - -Learn more about https://cloud.google.com/storage[Google’s Cloud Storage]. - -Learn more about https://azure.microsoft.com/en-us/products/storage/blobs[Azure Blob Store]. - -Learn more about https://aws.amazon.com/s3/[AWS S3]. \ No newline at end of file +====== \ No newline at end of file diff --git a/modules/pulsar-io/pages/connectors/sinks/google-bigquery.adoc b/modules/pulsar-io/pages/connectors/sinks/google-bigquery.adoc index 14a00fa..4937456 100644 --- a/modules/pulsar-io/pages/connectors/sinks/google-bigquery.adoc +++ b/modules/pulsar-io/pages/connectors/sinks/google-bigquery.adoc @@ -3,7 +3,8 @@ :connectorType: bigquery :page-tag: bigquery,sink-connector -BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. + +https://cloud.google.com/bigquery[Google BigQuery] is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. BigQuery {pulsar-short} Sink is not integrated with BigQuery directly. It uses {pulsar-short}’s built-in https://pulsar.apache.org/docs/adaptors-kafka/[Kafka Connect adapter] library to transform message data into a Kafka compatible format. Then the https://docs.confluent.io/kafka-connectors/bigquery/current/kafka_connect_bigquery_config.html[Kafka Connect BigQuery Sink] is used as the actual BigQuery integration. The adaptor provides a flexible and extensible framework for data transformation and processing. It supports various data formats, including JSON, Avro, and Protobuf, and enables users to apply transformations on the data as it is being streamed from {pulsar-short}. @@ -39,22 +40,302 @@ include::example$connectors/sinks/astra.csv[] === Kafka Connect Adapter Configuration (configs) -These values are provided in the “configs” area. View the code for these configurations https://github.com/apache/pulsar/blob/master/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarKafkaConnectSinkConfig.java[here]. +These values are provided in the `configs` area. -[%header,format=csv,cols="2,1,1,3"] +For source code for these configuration, see `https://github.com/apache/pulsar/blob/master/pulsar-io/kafka-connect-adaptor/src/main/java/org/apache/pulsar/io/kafka/connect/PulsarKafkaConnectSinkConfig.java[PulsarKafkaConnectSinkConfig.java]`. + +[%header,cols="1,1,1,4"] |=== -include::example$connectors/sinks/bigquery/config.csv[] +| Name | Required | Default | Description + +| kafkaConnectorSinkClass +| yes +| +a| A Kafka-connector sink class to use. Unless you've developed your own, use the value `com.wepay.kafka.connect.bigquery.BigQuerySinkConnector`. + +| offsetStorageTopic +| yes +| +| Pulsar topic to store offsets at. This is an additional topic to your topic with the actual data going to BigQuery. + +| sanitizeTopicName +| yes +| +a| Some connectors cannot handle Pulsar topic names like `persistent://a/b/topic`, and they won't sanitize the topic name themselves. +If enabled, all non alpha-digital characters in topic name are replaced with underscores. +In some cases this may result in topic name collisions (`topic_a` and `topic.a` both resolve to `topic_a`). + +This value _must_ be `true`. +Any other value causes an error. + +| batchSize +| no +| 16384 +| Size of messages in bytes the sink will attempt to batch messages together before flush. + +| collapsePartitionedTopics +| no +| false +| Supply Kafka record with topic name without -partition- suffix for partitioned topics. + +| kafkaConnectorConfigProperties +| no +| `{}` +| A key/value map of config properties to pass to the Kafka connector. See the reference table below. + +| lingerTimeMs +| no +| 2147483647L +| Time interval in milliseconds the sink will attempt to batch messages together before flush. + +| maxBatchBitsForOffset +| no +| 12 +| Number of bits (0 to 20) to use for index of message in the batch for translation into an offset. 0 to disable this behavior (Messages from the same batch will have the same offset which can affect some connectors.) + +| topic +| yes +| +| The Kafka topic name that is passed to the Kafka sink. + +| unwrapKeyValueIfAvailable +| no +| true +| In case of Record> data use key from KeyValue<> instead of one from Record. + +| useIndexAsOffset +| no +| true +| Allows use of message index instead of message sequenceId as offset, if available. Requires AppendIndexMetadataInterceptor and exposingBrokerEntryMetadataToClientEnabled=true on brokers. + +| useOptionalPrimitives +| no +| false +| Pulsar schema does not contain information whether the Schema is optional, Kafka's does. This provides a way to force all primitive schemas to be optional for Kafka. |=== === Google BigQuery Configuration (kafkaConnectorConfigProperties) -These values are provided in the "kafkaConnectorConfigProperties" area. View the code for these configurations https://github.com/confluentinc/kafka-connect-bigquery/blob/master/kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java[here]. +These values are provided in the `kafkaConnectorConfigProperties` area. -[%header,format=csv,cols="2,1,1,3"] -|=== -include::example$connectors/sinks/bigquery/kafkaConnectorConfigProperties.csv[] +For the source code for these configurations, see `https://github.com/confluentinc/kafka-connect-bigquery/blob/master/kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java[BigQuerySinkConfig.java]`. + +[%header,cols="1,1,1,4"] |=== +| Name | Required | Default | Description + +| allBQFieldsNullable +| no +| false +| If true, no fields in any produced BigQuery schemas are `REQUIRED`. +All non-nullable Avro fields are translated as `NULLABLE` (or `REPEATED`, if arrays). + +| allowBigQueryRequiredFieldRelaxation +| no +| false +| If true, fields in BigQuery Schema can be changed from `REQUIRED` to `NULLABLE`. + +| allowNewBigQueryFields +| no +| false +| If true, new fields can be added to BigQuery tables during subsequent schema updates. + +| allowSchemaUnionization +| no +| false +a| If true, the existing table schema (if one is present) is unionized with new record schemas during schema updates. + +If false, the record of the last schema in a batch is used for any necessary table creation and schema update attempts. + +Setting `allowSchemaUnionization` to false _and_ `allowNewBigQueryFields` and `allowBigQueryRequiredFieldRelaxation` to true is equivalent to setting `autoUpdateSchemas` to true in older (pre-2.0.0) versions of this connector. +In this case, if BigQuery raises a schema validation exception or a table doesn't exist when a writing a batch, the connector tries to remediate by required field relaxation and/or adding new fields. + +If `allowSchemaUnionization`, `allowNewBigQueryFields`, and `allowBigQueryRequiredFieldRelaxation` are all true, then the connector creates or updates tables with a schema whose fields are a union of the existing table schema fields and the fields present in all of the records of the current batch. + +The key difference is that with unionization disabled, new record schemas have to be a superset of the table schema in BigQuery. + +`allowSchemaUnionization` is a useful tool for parsing. +For example, if you'd like to remove fields from data upstream, the updated schemas still work in the connector. It is similarly useful when different tasks see records whose schemas contain different fields that are not in the table. + +However, be aware that if `allowSchemaUnionization` is set to true, and some bad records are in the topic, then the BigQuery schema can be permanently changed. +This presents two issues: + +* Since BigQuery doesn't allow columns to be dropped from tables, they add unnecessary noise to the schema. +* Since BigQuery doesn't allow column types to be modified, they can break downstream pipelines where well-behaved records have schemas whose field names overlap with the accidentally-added columns in the table, but the types don't match. + +| autoCreateBucket +| no +| true +| Whether to automatically create the given bucket if it does not exist. + +| autoCreateTables +| no +| false +| Automatically create BigQuery tables if they don't already exist + +| avroDataCacheSize +| no +| 100 +| The size of the cache to use when converting schemas from Avro to Kafka Connect. + +| batchLoadIntervalSec +| no +| 120 +| The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if `enableBatchLoad` is configured. + +| bigQueryMessageTimePartitioning +| no +| false +| Whether or not to use the message time when inserting records. Default uses the connector processing time. + +| bigQueryPartitionDecorator +| no +| true +| Whether or not to append partition decorator to BigQuery table name when inserting records. Default is true. Setting this to true appends partition decorator to table name (e.g. table$yyyyMMdd depending on the configuration set for bigQueryPartitionDecorator). Setting this to false bypasses the logic to append the partition decorator and uses raw table name for inserts. + +| bigQueryRetry +| no +| 0 +| The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error. + +| bigQueryRetryWait +| no +| 1000 +| The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error. + +| clusteringPartitionFieldNames +| no +| +| Comma-separated list of fields where data is clustered in BigQuery. + +| convertDoubleSpecialValues +| no +| false +| Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successful delivery to BigQuery. + +| defaultDataset +| yes +| +| The default dataset to be used + +| deleteEnabled +| no +| false +| Enable delete functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. A delete will be performed when a record with a null value (that is–a tombstone record) is read. This feature will not work with SMTs that change the name of the topic. + +| enableBatchLoad +| no +| empty +| Beta Feature. Use with caution. The sublist of topics to be batch loaded through GCS. + +| gcsBucketName +| no +| empty +| The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if `enableBatchLoad` is configured. + +| includeKafkaData +| no +| false +| Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows. + +| intermediateTableSuffix +| no +| `.tmp` +| A suffix that will be appended to the names of destination tables to create the names for the corresponding intermediate tables. Multiple intermediate tables may be created for a single destination table, but their names will always start with the name of the destination table, followed by this suffix, and possibly followed by an additional suffix. + +| kafkaDataFieldName +| no +| +| The Kafka data field name. The default value is null, which means the Kafka Data field will not be included. + +| kafkaKeyFieldName +| no +| +| The Kafka key field name. The default value is null, which means the Kafka Key field will not be included. + +| keyfile +| yes +| +a| Can be either a string representation of the Google credentials file or the path to the Google credentials file itself. + +When using the Astra Streaming UI, the string representation must be used. If using pulsar-admin with Astra Streaming, either the representation or file can be used. + +| keySource +| yes +| `FILE` +a| Determines whether the keyfile configuration is the path to the credentials JSON file or to the JSON itself. Available values are `FILE` and `JSON`. + +When using the Astra Streaming UI, JSON will be the only option. If using pulsar-admin with Astra Streaming, either the representation or file can be used. + +| name +| yes +| +| The name of the connector. Use the same value as Pulsar sink name. + +| mergeIntervalMs +| no +| `60_000L` +| How often (in milliseconds) to perform a merge flush, if upsert/delete is enabled. Can be set to -1 to disable periodic flushing. + +| mergeRecordsThreshold +| no +| `-1` +| How many records to write to an intermediate table before performing a merge flush, if upsert/delete is enabled. If set to `-1`, then record count-based flushing is disabled. + +| project +| yes +| +| The BigQuery project to write to + +| queueSize +| no +| `-1` +| The maximum size (or `-1` for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is triggered or the size of the queue drops under half of the maximum size. + +| sanitizeTopics +| yes +| false +a| Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names. + +The only accepted value is `false`. Providing any other value will result in an error. + +| schemaRetriever +| no +| `com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever` +| A class that can be used for automatically creating tables and/or updating schemas. + +| threadPoolSize +| no +| 10 +| The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery. + +| timePartitioningType +| no +| `DAY` +| The time partitioning type to use when creating tables. Existing tables will not be altered to use this partitioning type. Valid Values: (case insensitive) [MONTH, YEAR, HOUR, DAY] + +| timestampPartitionFieldName +| no +| +| The name of the field in the value that contains the timestamp to partition by in BigQuery and enable timestamp partitioning for each table. Leave this configuration blank, to enable ingestion time partitioning for each table. + +| topic2TableMap +| no +| +a| Optional map of topics to tables in the format of comma-separated tuples, such as `:,:,...` + +Because `sanitizeTopicName` must be `true`, any alphanumeric character in topic names are replaced with underscores. +Keep this in mind when creating the mapping to avoid overlapping names. +For example, if the topic name is provided as `persistent://a/b/c-d`, then the mapping topic name would be `persistent___a_b_c_d`. -== What's next? +| topics +| yes +| +| A list of Kafka topics to read from. Use the same name as the Pulsar topic. +Only provide the topic name, not the whole address. -Learn more about Google’s BigQuery features and capabilities on https://cloud.google.com/bigquery[their site]. +| upsertEnabled +| no +| false +| Enable upsert functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. Row-matching will be performed based on the contents of record keys. This feature won't work with SMTs that change the name of the topic. +|=== \ No newline at end of file diff --git a/modules/pulsar-io/pages/connectors/sources/kafka.adoc b/modules/pulsar-io/pages/connectors/sources/kafka.adoc index 9291054..0b10b9e 100644 --- a/modules/pulsar-io/pages/connectors/sources/kafka.adoc +++ b/modules/pulsar-io/pages/connectors/sources/kafka.adoc @@ -30,10 +30,36 @@ There are two sets of parameters that support source connectors. include::example$connectors/sources/astra.csv[] |=== -=== Kafka (configs) +=== Kafka configuration options -These values are provided in the "configs" area. +These values are provided in the `configs` area: -The {product} Kafka source connector supports all configuration properties provided by {pulsar}. +[cols="1,1,1,1,3",options=header] +|=== +|*Name* +|*Type* +|*Required* +|*Default* +|*Description* + +| `bootstrapServers` |String| true | " " (empty string) | A comma-separated list of host and port pairs for establishing the initial connection to the Kafka cluster. +| `groupId` |String| true | " " (empty string) | A unique string that identifies the group of consumer processes to which this consumer belongs. +| `fetchMinBytes` | long|false | 1 | The minimum byte expected for each fetch response. +| `autoCommitEnabled` | boolean |false | true | If set to true, the consumer's offset is periodically committed in the background. + +This committed offset is used when the process fails as the position from which a new consumer begins. +| `autoCommitIntervalMs` | long|false | 5000 | The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if `autoCommitEnabled` is set to true. +| `heartbeatIntervalMs` | long| false | 3000 | The interval between heartbeats to the consumer when using Kafka's group management facilities. + +**Note: `heartbeatIntervalMs` must be smaller than `sessionTimeoutMs`**. +| `sessionTimeoutMs` | long|false | 30000 | The timeout used to detect consumer failures when using Kafka's group management facility. +| `topic` | String|true | " " (empty string)| The Kafka topic that sends messages to {pulsar-short}. +| `consumerConfigProperties` | Map| false | " " (empty string) | The consumer configuration properties to be passed to consumers. + +**Note: other properties specified in the connector configuration file take precedence over this configuration**. +| `keyDeserializationClass` | String|false | org.apache.kafka.common.serialization.StringDeserializer | The deserializer class for Kafka consumers to deserialize keys. + +The deserializer is set by a specific implementation of https://github.com/apache/pulsar/blob/master/pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/KafkaAbstractSource.java[`KafkaAbstractSource`]. +| `valueDeserializationClass` | String|false | org.apache.kafka.common.serialization.ByteArrayDeserializer | The deserializer class for Kafka consumers to deserialize values. +| `autoOffsetReset` | String | false | earliest | The default offset reset policy. -Please refer to the https://pulsar.apache.org/docs/io-kafka-source#property[connector properties] for a complete list. +|=== + +The {product} Kafka source connector supports all configuration properties provided by {pulsar}. +For a complete list, see the https://pulsar.apache.org/docs/io-kafka-source#property[Kafka source connector properties]. diff --git a/modules/pulsar-io/pages/connectors/sources/kinesis.adoc b/modules/pulsar-io/pages/connectors/sources/kinesis.adoc index 22633a2..64497a0 100644 --- a/modules/pulsar-io/pages/connectors/sources/kinesis.adoc +++ b/modules/pulsar-io/pages/connectors/sources/kinesis.adoc @@ -30,10 +30,47 @@ There are two sets of parameters that support source connectors. include::example$connectors/sources/astra.csv[] |=== -=== Kinesis (configs) +=== Kinesis configuration options -These values are provided in the "configs" area. +These values are provided in the `configs` area: -The {product} Kinesis source connector supports all configuration properties provided by {pulsar}. +[cols="2,1,1,1,3",options=header] +|=== +|*Name* +|*Type* +|*Required* +|*Default* +|*Description* + +|`initialPositionInStream`|InitialPositionInStream|false|LATEST|The position where the connector starts from. Below are the available options: + +* `AT_TIMESTAMP`: start from the record at or after the specified timestamp. + +* `LATEST`: start after the most recent data record. + +* `TRIM_HORIZON`: start from the oldest available data record. +|`startAtTime`|Date|false|" " (empty string)|If set to `AT_TIMESTAMP`, it specifies the point in time to start consumption. +|`applicationName`|String|false|{pulsar-short} IO connector|The name of the Amazon Kinesis application. + +By default, the application name is included in the user agent string used to make AWS requests. This can assist with troubleshooting, for example, distinguish requests made by separate connector instances. +|`checkpointInterval`|long|false|60000|The frequency of the Kinesis stream checkpoint in milliseconds. +|`backoffTime`|long|false|3000|The amount of time to delay between requests when the connector encounters a throttling exception from AWS Kinesis in milliseconds. +|`numRetries`|int|false|3|The number of re-attempts when the connector encounters an exception while trying to set a checkpoint. +|`receiveQueueSize`|int|false|1000|The maximum number of AWS records that can be buffered inside the connector. + +Once the `receiveQueueSize` is reached, the connector does not consume any messages from Kinesis until some messages in the queue are successfully consumed. +|`dynamoEndpoint`|String|false|" " (empty string)|The Dynamo end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. +|`cloudwatchEndpoint`|String|false|" " (empty string)|The Cloudwatch end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. +|`useEnhancedFanOut`|boolean|false|true|If set to true, it uses Kinesis enhanced fan-out. +If set to false, it uses polling. +|`awsEndpoint`|String|false|" " (empty string)|The Kinesis end-point URL, which can be found at https://docs.aws.amazon.com/general/latest/gr/rande.html[here]. +|`awsRegion`|String|false|" " (empty string)|The AWS region. **Example** `us-west-1`, `us-west-2` +|`awsKinesisStreamName`|String|true|" " (empty string)|The Kinesis stream name. +|`awsCredentialPluginParam`|String |false|" " (empty string)|The JSON parameter to initialize `awsCredentialsProviderPlugin`. +|`awsCredentialPluginName`|String|false|" " (empty string)|The fully-qualified class name of implementation of {@inject: github:AwsCredentialProviderPlugin:/pulsar-io/aws/src/main/java/org/apache/pulsar/io/aws/AwsCredentialProviderPlugin.java} + +`awsCredentialProviderPlugin` has the following built-in plugs: + +`org.apache.pulsar.io.kinesis.AwsDefaultProviderChainPlugin`: this plugin uses the default AWS provider chain. For more information, see https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html#credentials-default[using the default credential provider chain]. + +`org.apache.pulsar.io.kinesis.STSAssumeRoleProviderPlugin`: this plugin takes a configuration via the `awsCredentialPluginParam` that describes a role to assume when running the KCL. + +**JSON configuration example** + +`{"roleArn": "arn...", "roleSessionName": "name"}` + +`awsCredentialPluginName` is a factory class which creates an AWSCredentialsProvider that is used by Kinesis sink. + +If `awsCredentialPluginName` set to empty, the Kinesis sink creates a default AWSCredentialsProvider which accepts json-map of credentials in `awsCredentialPluginParam`. -Please refer to the https://pulsar.apache.org/docs/io-kinesis-source#configuration[connector properties] for a complete list. +|=== + +The {product} Kinesis source connector supports all configuration properties provided by {pulsar}. +For a complete list, see the https://pulsar.apache.org/docs/io-kinesis-source#configuration[Kinesis source connector properties]. diff --git a/modules/pulsar-io/partials/connectors/sinks/curl-monitor-response.adoc b/modules/pulsar-io/partials/connectors/sinks/curl-monitor-response.adoc deleted file mode 100644 index eccc7a5..0000000 --- a/modules/pulsar-io/partials/connectors/sinks/curl-monitor-response.adoc +++ /dev/null @@ -1,113 +0,0 @@ -[source,json] ----- -{ - "tenant": "string", - "namespace": "string", - "name": "string", - "className": "string", - "sourceSubscriptionName": "string", - "sourceSubscriptionPosition": "Latest", - "inputs": [ - "string" - ], - "topicToSerdeClassName": { - "property1": "string", - "property2": "string" - }, - "topicsPattern": "string", - "topicToSchemaType": { - "property1": "string", - "property2": "string" - }, - "topicToSchemaProperties": { - "property1": "string", - "property2": "string" - }, - "inputSpecs": { - "property1": { - "schemaType": "string", - "serdeClassName": "string", - "schemaProperties": { - "property1": "string", - "property2": "string" - }, - "consumerProperties": { - "property1": "string", - "property2": "string" - }, - "receiverQueueSize": 0, - "cryptoConfig": { - "cryptoKeyReaderClassName": "string", - "cryptoKeyReaderConfig": { - "property1": {}, - "property2": {} - }, - "encryptionKeys": [ - "string" - ], - "producerCryptoFailureAction": "FAIL", - "consumerCryptoFailureAction": "FAIL" - }, - "poolMessages": true, - "regexPattern": true - }, - "property2": { - "schemaType": "string", - "serdeClassName": "string", - "schemaProperties": { - "property1": "string", - "property2": "string" - }, - "consumerProperties": { - "property1": "string", - "property2": "string" - }, - "receiverQueueSize": 0, - "cryptoConfig": { - "cryptoKeyReaderClassName": "string", - "cryptoKeyReaderConfig": { - "property1": {}, - "property2": {} - }, - "encryptionKeys": [ - "string" - ], - "producerCryptoFailureAction": "FAIL", - "consumerCryptoFailureAction": "FAIL" - }, - "poolMessages": true, - "regexPattern": true - } - }, - "maxMessageRetries": 0, - "deadLetterTopic": "string", - "configs": { - "property1": {}, - "property2": {} - }, - "secrets": { - "property1": {}, - "property2": {} - }, - "parallelism": 0, - "processingGuarantees": "ATLEAST_ONCE", - "retainOrdering": true, - "retainKeyOrdering": true, - "resources": { - "cpu": 0, - "ram": 0, - "disk": 0 - }, - "autoAck": true, - "timeoutMs": 0, - "negativeAckRedeliveryDelayMs": 0, - "sinkType": "string", - "archive": "string", - "cleanupSubscription": true, - "runtimeFlags": "string", - "customRuntimeOptions": "string", - "transformFunction": "string", - "transformFunctionClassName": "string", - "transformFunctionConfig": "string" -} ----- \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/curl-status-response.adoc b/modules/pulsar-io/partials/connectors/sinks/curl-status-response.adoc deleted file mode 100644 index bf97099..0000000 --- a/modules/pulsar-io/partials/connectors/sinks/curl-status-response.adoc +++ /dev/null @@ -1,65 +0,0 @@ -Status response for all connector instances -[source,json] ----- -{ - "numInstances": 0, - "numRunning": 0, - "instances": [ - { - "instanceId": 0, - "status": { - "running": true, - "error": "string", - "numRestarts": 0, - "numReadFromPulsar": 0, - "numSystemExceptions": 0, - "latestSystemExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numSinkExceptions": 0, - "latestSinkExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numWrittenToSink": 0, - "lastReceivedTime": 0, - "workerId": "string" - } - } - ] -} ----- - -Status response for individual connector instance -[source,json] ----- -{ - "running": true, - "error": "string", - "numRestarts": 0, - "numReadFromPulsar": 0, - "numSystemExceptions": 0, - "latestSystemExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numSinkExceptions": 0, - "latestSinkExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numWrittenToSink": 0, - "lastReceivedTime": 0, - "workerId": "string" -} ----- - diff --git a/modules/pulsar-io/partials/connectors/sinks/curl-tab-prereq.adoc b/modules/pulsar-io/partials/connectors/sinks/curl-tab-prereq.adoc new file mode 100644 index 0000000..6a95ce2 --- /dev/null +++ b/modules/pulsar-io/partials/connectors/sinks/curl-tab-prereq.adoc @@ -0,0 +1,22 @@ +You need a {pulsar-short} token for REST API authentication. +This is different from your {astra-db} application tokens. + +. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. + +. Click your tenant's name, and then click the *Settings* tab. + +. Click *Create Token*. + +. Copy the token, store it securely, and then click *Close*. + +. Click the *Connect* tab, and then copy the *Web Service URL*. + +. Create environment variables for your tenant's token and web service URL: ++ +[source,shell,subs="attributes+"] +---- +export WEB_SERVICE_URL= +export ASTRA_STREAMING_TOKEN= +---- ++ +Refer to the complete https://pulsar.apache.org/sink-rest-api/#tag/sink[{pulsar-short} sinks REST API spec] for all available options. \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/curl-tab.adoc b/modules/pulsar-io/partials/connectors/sinks/curl-tab.adoc deleted file mode 100644 index 1df8789..0000000 --- a/modules/pulsar-io/partials/connectors/sinks/curl-tab.adoc +++ /dev/null @@ -1,60 +0,0 @@ -// tag::curl-prereq[] -You need a {pulsar-short} token for REST API authentication. -This is different from your {astra-db} application tokens. - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. - -. Click your tenant's name, and then click the *Settings* tab. - -. Click *Create Token*. - -. Copy the token, store it securely, and then click *Close*. - -. Click the *Connect* tab, and then copy the *Web Service URL*. - -. Create environment variables for your tenant's token and web service URL: -+ -[source,shell,subs="attributes+"] ----- -export WEB_SERVICE_URL= -export ASTRA_STREAMING_TOKEN= ----- -+ -Refer to the complete https://pulsar.apache.org/sink-rest-api/#tag/sink[{pulsar-short} sinks REST API spec] for all available options. -// end::curl-prereq[] -// tag::curl-start[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-start.sh[] ----- -// end::curl-start[] -// tag::curl-stop[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-stop.sh[] ----- -// end::curl-stop[] -// tag::curl-restart[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-restart.sh[] ----- -// end::curl-restart[] -// tag::curl-delete[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-delete.sh[] ----- -// end::curl-delete[] -// tag::curl-info[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-info.sh[] ----- -// end::curl-info[] -// tag::curl-status[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/curl-status.sh[] ----- -// end::curl-status[] \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/get-started.adoc b/modules/pulsar-io/partials/connectors/sinks/get-started.adoc index f661360..55e8dac 100644 --- a/modules/pulsar-io/partials/connectors/sinks/get-started.adoc +++ b/modules/pulsar-io/partials/connectors/sinks/get-started.adoc @@ -1,6 +1,7 @@ // TODO: include details about retrieving a tenant name // TODO: include details about creating a topic -Set the required variables using any of the methods below. +Set the following environment variables using `pulsar-admin` or curl: + [source,shell,subs="attributes+"] ---- export TENANT= @@ -10,11 +11,11 @@ export SINK_NAME={connectorName} ---- [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -22,10 +23,10 @@ include::example$connectors/sinks/{connectorType}/pulsar-admin-create.sh[] ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -38,4 +39,4 @@ Sample Config Data:: -- include::example$connectors/sinks/{connectorType}/sample-data.adoc[] -- -==== \ No newline at end of file +====== \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/manage.adoc b/modules/pulsar-io/partials/connectors/sinks/manage.adoc index 6b8e91f..30cd89a 100644 --- a/modules/pulsar-io/partials/connectors/sinks/manage.adoc +++ b/modules/pulsar-io/partials/connectors/sinks/manage.adoc @@ -1,66 +1,140 @@ === Start + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq;admin-start] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Start all instances of a connector +./bin/pulsar-admin sinks start \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only start an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq;curl-start] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Start all instances of a connector +curl -sS --fail --location --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/start' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" + +# Start an individual instance of a connector +curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/start" \ +-H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Stop + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq;admin-stop] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Stop all instances of a connector +./bin/pulsar-admin sinks stop \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only stop an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq;curl-stop] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Stop all instances of a connector +curl -sS --fail --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/stop' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" + +# Stop an individual instance of a connector +curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/stop" \ + --H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Restart + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq;admin-restart] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Restart all instances of a connector +./bin/pulsar-admin sinks restart \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only restart an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq;curl-restart] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Restart all instances of a connector +curl -sS --fail --request POST ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/restart' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" + +# Restart an individual instance of a connector +curl -X POST "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/restart" \ +-H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Update [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- include::example$connectors/sinks/{connectorType}/pulsar-admin-update.sh[] ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- include::example$connectors/sinks/{connectorType}/curl-update.sh[] @@ -72,19 +146,37 @@ Sample Config Data:: -- include::example$connectors/sinks/{connectorType}/sample-data.adoc[] -- -==== +====== === Delete + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq;admin-delete] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + +[source,shell] +---- +# Delete all instances of a connector +./bin/pulsar-admin sinks delete \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq;curl-delete] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Delete all instances of a connector +curl -sS --fail --location --request DELETE ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" +---- -- -==== \ No newline at end of file +====== \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/monitoring.adoc b/modules/pulsar-io/partials/connectors/sinks/monitoring.adoc index e843cfc..0dab3e4 100644 --- a/modules/pulsar-io/partials/connectors/sinks/monitoring.adoc +++ b/modules/pulsar-io/partials/connectors/sinks/monitoring.adoc @@ -1,64 +1,263 @@ === Info [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- -include::example$connectors/sinks/pulsar-admin-info.sh[] +# Get information about a connector +./bin/pulsar-admin sinks get \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- -include::example$connectors/sinks/curl-info.sh[] +# Get information about a connector +curl -sS --fail --location ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" ---- --- - -Response:: + --- -include::partial$connectors/sinks/curl-monitor-response.adoc[] --- +.Result +[%collapsible] ==== +[source,json] +---- +{ + "tenant": "string", + "namespace": "string", + "name": "string", + "className": "string", + "sourceSubscriptionName": "string", + "sourceSubscriptionPosition": "Latest", + "inputs": [ + "string" + ], + "topicToSerdeClassName": { + "property1": "string", + "property2": "string" + }, + "topicsPattern": "string", + "topicToSchemaType": { + "property1": "string", + "property2": "string" + }, + "topicToSchemaProperties": { + "property1": "string", + "property2": "string" + }, + "inputSpecs": { + "property1": { + "schemaType": "string", + "serdeClassName": "string", + "schemaProperties": { + "property1": "string", + "property2": "string" + }, + "consumerProperties": { + "property1": "string", + "property2": "string" + }, + "receiverQueueSize": 0, + "cryptoConfig": { + "cryptoKeyReaderClassName": "string", + "cryptoKeyReaderConfig": { + "property1": {}, + "property2": {} + }, + "encryptionKeys": [ + "string" + ], + "producerCryptoFailureAction": "FAIL", + "consumerCryptoFailureAction": "FAIL" + }, + "poolMessages": true, + "regexPattern": true + }, + "property2": { + "schemaType": "string", + "serdeClassName": "string", + "schemaProperties": { + "property1": "string", + "property2": "string" + }, + "consumerProperties": { + "property1": "string", + "property2": "string" + }, + "receiverQueueSize": 0, + "cryptoConfig": { + "cryptoKeyReaderClassName": "string", + "cryptoKeyReaderConfig": { + "property1": {}, + "property2": {} + }, + "encryptionKeys": [ + "string" + ], + "producerCryptoFailureAction": "FAIL", + "consumerCryptoFailureAction": "FAIL" + }, + "poolMessages": true, + "regexPattern": true + } + }, + "maxMessageRetries": 0, + "deadLetterTopic": "string", + "configs": { + "property1": {}, + "property2": {} + }, + "secrets": { + "property1": {}, + "property2": {} + }, + "parallelism": 0, + "processingGuarantees": "ATLEAST_ONCE", + "retainOrdering": true, + "retainKeyOrdering": true, + "resources": { + "cpu": 0, + "ram": 0, + "disk": 0 + }, + "autoAck": true, + "timeoutMs": 0, + "negativeAckRedeliveryDelayMs": 0, + "sinkType": "string", + "archive": "string", + "cleanupSubscription": true, + "runtimeFlags": "string", + "customRuntimeOptions": "string", + "transformFunction": "string", + "transformFunctionClassName": "string", + "transformFunctionConfig": "string" +} +---- +==== +-- +====== === Health [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sinks/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sinks/pulsar-admin-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- -include::example$connectors/sinks/pulsar-admin-status.sh[] +# Check connector status +./bin/pulsar-admin sinks status \ + --instance-id "$SINK_INSTANCEID" \ + --namespace "$NAMESPACE" \ + --name "$SINK_NAME" \ + --tenant "$TENANT" ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sinks/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sinks/curl-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- -include::example$connectors/sinks/curl-status.sh[] +# Get the status of all connector instances +curl -sS --fail --location ''$WEB_SERVICE_URL'/admin/v3/sinks/'$TENANT'/'$NAMESPACE'/'$SINK_NAME'/status' \ + --header "Authorization: Bearer $ASTRA_STREAMING_TOKEN" + +# Get the status of an individual connector instance +curl "$WEB_SERVICE_URL/admin/v3/sinks/$TENANT/$NAMESPACE/$SINK_NAME/$SINK_INSTANCEID/status" \ + -H "accept: application/json" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" ---- --- -Response:: -+ --- -include::partial$connectors/sinks/curl-status-response.adoc[] --- +.Result +[%collapsible] +==== +Status response for all connector instances: + +[source,json] +---- +{ + "numInstances": 0, + "numRunning": 0, + "instances": [ + { + "instanceId": 0, + "status": { + "running": true, + "error": "string", + "numRestarts": 0, + "numReadFromPulsar": 0, + "numSystemExceptions": 0, + "latestSystemExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numSinkExceptions": 0, + "latestSinkExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numWrittenToSink": 0, + "lastReceivedTime": 0, + "workerId": "string" + } + } + ] +} +---- + +Status response for individual connector instance: + +[source,json] +---- +{ + "running": true, + "error": "string", + "numRestarts": 0, + "numReadFromPulsar": 0, + "numSystemExceptions": 0, + "latestSystemExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numSinkExceptions": 0, + "latestSinkExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numWrittenToSink": 0, + "lastReceivedTime": 0, + "workerId": "string" +} +---- ==== +-- +====== === Metrics diff --git a/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab-prereq.adoc b/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab-prereq.adoc new file mode 100644 index 0000000..b211cea --- /dev/null +++ b/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab-prereq.adoc @@ -0,0 +1,3 @@ +Refer to the complete https://pulsar.apache.org/reference/#/{pulsar-version}.x/pulsar-admin/sinks[pulsar-admin sinks spec] for all available options. + +Assuming you have downloaded `client.conf` to the `{pulsar-short}` folder: \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab.adoc b/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab.adoc deleted file mode 100644 index 71a48f0..0000000 --- a/modules/pulsar-io/partials/connectors/sinks/pulsar-admin-tab.adoc +++ /dev/null @@ -1,43 +0,0 @@ -// TODO: include an explanation or link to retrieving the client.conf from AS UI - -// tag::admin-prereq[] -Refer to the complete https://pulsar.apache.org/tools/pulsar-admin/{pulsar-version}.0-SNAPSHOT/#sinks[pulsar-admin sinks spec] for all available options. - -Assuming you have downloaded client.conf to the {pulsar-short} folder: -// end::admin-prereq[] -// tag::admin-start[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-start.sh[] ----- -// end::admin-start[] -// tag::admin-stop[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-stop.sh[] ----- -// end::admin-stop[] -// tag::admin-restart[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-restart.sh[] ----- -// end::admin-restart[] -// tag::admin-delete[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-delete.sh[] ----- -// end::admin-delete[] -// tag::admin-info[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-info.sh[] ----- -// end::admin-info[] -// tag::admin-status[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sinks/pulsar-admin-status.sh[] ----- -// end::admin-status[] \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sinks/sink-connectors.adoc b/modules/pulsar-io/partials/connectors/sinks/sink-connectors.adoc deleted file mode 100644 index efc3471..0000000 --- a/modules/pulsar-io/partials/connectors/sinks/sink-connectors.adoc +++ /dev/null @@ -1,116 +0,0 @@ -// tag::production[] -[#astradb-sink] -=== AstraDB sink - -The AstraDB sink connector reads messages from {pulsar} topics and writes them to AstraDB systems. - -xref:connectors/sinks/astra-db.adoc[AstraDB sink documentation]. - -[#cloudstorage-sink] -=== Cloud Storage sink - -The Cloud Storage sink connector reads messages from {pulsar} topics and writes them to Cloud Storage systems. - -xref:connectors/sinks/cloud-storage.adoc[Cloud Storage sink documentation]. - -[#elasticsearch-sink] -=== ElasticSearch sink - -The Elasticsearch sink connector reads messages from {pulsar} topics and writes them to Elasticsearch systems. - -xref:connectors/sinks/elastic-search.adoc[Elasticsearch sink documentation]. - -[#bigquery-sink] -=== Google BigQuery sink - -The Google BigQuery sink connector reads messages from {pulsar} topics and writes them to BigQuery systems. - -xref:connectors/sinks/google-bigquery.adoc[Google BigQuery sink documentation]. - -[#jdbc-clickhouse-sink] -=== JDBC-Clickhouse sink - -The JDBC-ClickHouse sink connector reads messages from {pulsar} topics and writes them to JDBC-ClickHouse systems. - -xref:connectors/sinks/jdbc-clickhouse.adoc[JDBC ClickHouse sink documentation]. - -[#jdbc-mariadb-sink] -=== JDBC-MariaDB sink - -The JDBC-MariaDB sink connector reads messages from {pulsar} topics and writes them to JDBC-MariaDB systems. - -xref:connectors/sinks/jdbc-mariadb.adoc[JDBC MariaDB sink documentation]. - -[#jdbc-postgres-sink] -=== JDBC-PostgreSQL sink - -The JDBC-PostgreSQL sink connector reads messages from {pulsar} topics and writes them to JDBC-PostgreSQL systems. - -xref:connectors/sinks/jdbc-postgres.adoc[JDBC PostgreSQL sink documentation]. - -[#jdbc-sqlite-sink] -=== *JDBC-SQLite* - -The JDBC-SQLite sink connector reads messages from {pulsar} topics and writes them to JDBC-SQLite systems. - -xref:connectors/sinks/jdbc-sqllite.adoc[JDBC SQLite sink documentation]. - -[#kafka-sink] -=== *Kafka* - -The Kafka sink connector reads messages from {pulsar} topics and writes them to Kafka systems. - -xref:connectors/sinks/kafka.adoc[Kafka sink documentation]. - -[#kinesis-sink] -=== Kinesis - -The Kinesis sink connector reads messages from {pulsar} topics and writes them to Kinesis systems. - -xref:connectors/sinks/kinesis.adoc[Kinesis sink documentation]. - -[#snowflake-sink] -=== Snowflake - -The Snowflake sink connector reads messages from {pulsar} topics and writes them to Snowflake systems. - -xref:connectors/sinks/snowflake.adoc[Snowflake sink documentation]. -// end::production[] - -// tag::sink-experimental[] -Kinetica + -Aerospike + -Azure DocumentDB + -Azure Data Explorer (Kusto) + -Batch Data Generator + -CoAP + -Couchbase + -DataDog + -Diffusion + -Flume + -Apache Geode + -Hazelcast + -Apache HBase + -HDFS 2 + -HDFS 3 + -Humio + -InfluxDB + -JMS + -Apache Kudu + -MarkLogic + -MongoDB + -MQTT + -Neo4J + -New Relic + -OrientDB + -Apache Phoenix + -PLC4X + -RabbitMQ + -Redis + -SAP HANA + -SingleStore + -Apache Solr + -Splunk + -XTDB + -Zeebe + -// end::sink-experimental[] \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/curl-monitor-response.adoc b/modules/pulsar-io/partials/connectors/sources/curl-monitor-response.adoc deleted file mode 100644 index dc9e9e4..0000000 --- a/modules/pulsar-io/partials/connectors/sources/curl-monitor-response.adoc +++ /dev/null @@ -1,56 +0,0 @@ -[source,json] ----- - { - "tenant": "string", - "namespace": "string", - "name": "string", - "className": "string", - "topicName": "string", - "producerConfig": { - "maxPendingMessages": 0, - "maxPendingMessagesAcrossPartitions": 0, - "useThreadLocalProducers": true, - "cryptoConfig": { - "cryptoKeyReaderClassName": "string", - "cryptoKeyReaderConfig": { - "property1": {}, - "property2": {} - }, - "encryptionKeys": [ - "string" - ], - "producerCryptoFailureAction": "FAIL", - "consumerCryptoFailureAction": "FAIL" - }, - "batchBuilder": "string" - }, - "serdeClassName": "string", - "schemaType": "string", - "configs": { - "property1": {}, - "property2": {} - }, - "secrets": { - "property1": {}, - "property2": {} - }, - "parallelism": 0, - "processingGuarantees": "ATLEAST_ONCE", - "resources": { - "cpu": 0, - "ram": 0, - "disk": 0 - }, - "archive": "string", - "runtimeFlags": "string", - "customRuntimeOptions": "string", - "batchSourceConfig": { - "discoveryTriggererClassName": "string", - "discoveryTriggererConfig": { - "property1": {}, - "property2": {} - } - }, - "batchBuilder": "string" -} ----- \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/curl-status-response.adoc b/modules/pulsar-io/partials/connectors/sources/curl-status-response.adoc deleted file mode 100644 index 7bb61b1..0000000 --- a/modules/pulsar-io/partials/connectors/sources/curl-status-response.adoc +++ /dev/null @@ -1,64 +0,0 @@ -Status response for all connector instances -[source,json] ----- -{ - "numInstances": 0, - "numRunning": 0, - "instances": [ - { - "instanceId": 0, - "status": { - "running": true, - "error": "string", - "numRestarts": 0, - "numReceivedFromSource": 0, - "numSystemExceptions": 0, - "latestSystemExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numSourceExceptions": 0, - "latestSourceExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numWritten": 0, - "lastReceivedTime": 0, - "workerId": "string" - } - } - ] -} ----- - -Status response for individual connector instance -[source,json] ----- -{ - "running": true, - "error": "string", - "numRestarts": 0, - "numReceivedFromSource": 0, - "numSystemExceptions": 0, - "latestSystemExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numSourceExceptions": 0, - "latestSourceExceptions": [ - { - "exceptionString": "string", - "timestampMs": 0 - } - ], - "numWritten": 0, - "lastReceivedTime": 0, - "workerId": "string" -} ----- \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/curl-tab-prereq.adoc b/modules/pulsar-io/partials/connectors/sources/curl-tab-prereq.adoc new file mode 100644 index 0000000..66e59b5 --- /dev/null +++ b/modules/pulsar-io/partials/connectors/sources/curl-tab-prereq.adoc @@ -0,0 +1,23 @@ +You need a {pulsar-short} token for REST API authentication. +This is different from your {astra-db} application tokens. + +. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. + +. Click your tenant's name, and then click the *Settings* tab. + +. Click *Create Token*. + +. Copy the token, store it securely, and then click *Close*. + +. Click the *Connect* tab, and then copy the *Web Service URL*. + +. Create environment variables for your tenant's token and web service URL: ++ +[source,shell,subs="attributes+"] +---- +export WEB_SERVICE_URL= +export ASTRA_STREAMING_TOKEN= +---- ++ +Refer to the complete https://pulsar.apache.org/source-rest-api/#tag/sources[{pulsar-short} sources REST API spec], +for all available options. \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/curl-tab.adoc b/modules/pulsar-io/partials/connectors/sources/curl-tab.adoc deleted file mode 100644 index 06c1556..0000000 --- a/modules/pulsar-io/partials/connectors/sources/curl-tab.adoc +++ /dev/null @@ -1,61 +0,0 @@ -// tag::curl-prereq[] -You need a {pulsar-short} token for REST API authentication. -This is different from your {astra-db} application tokens. - -. In the {astra-ui-link} header, click icon:grip[name="Applications"], and then select *Streaming*. - -. Click your tenant's name, and then click the *Settings* tab. - -. Click *Create Token*. - -. Copy the token, store it securely, and then click *Close*. - -. Click the *Connect* tab, and then copy the *Web Service URL*. - -. Create environment variables for your tenant's token and web service URL: -+ -[source,shell,subs="attributes+"] ----- -export WEB_SERVICE_URL= -export ASTRA_STREAMING_TOKEN= ----- -+ -Refer to the complete https://pulsar.apache.org/source-rest-api/#tag/sources[{pulsar-short} sources REST API spec], -for all available options. -// end::curl-prereq[] -// tag::curl-start[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-start.sh[] ----- -// end::curl-start[] -// tag::curl-stop[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-stop.sh[] ----- -// end::curl-stop[] -// tag::curl-restart[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-restart.sh[] ----- -// end::curl-restart[] -// tag::curl-delete[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-delete.sh[] ----- -// end::curl-delete[] -// tag::curl-info[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-info.sh[] ----- -// end::curl-info[] -// tag::curl-status[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/curl-status.sh[] ----- -// end::curl-status[] \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/get-started.adoc b/modules/pulsar-io/partials/connectors/sources/get-started.adoc index ae15e78..dc8ebae 100644 --- a/modules/pulsar-io/partials/connectors/sources/get-started.adoc +++ b/modules/pulsar-io/partials/connectors/sources/get-started.adoc @@ -1,4 +1,4 @@ -Set the required variables using any of the methods below. +Set the following environment variables using `pulsar-admin` or curl: [source,shell,subs="attributes+"] ---- @@ -9,11 +9,11 @@ export SOURCE_NAME={connectorName} ---- [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -21,10 +21,10 @@ include::example$connectors/sources/{connectorType}/pulsar-admin-create.sh[] ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sources/curl-tab-prereq.adoc[] [source,shell,subs="attributes+"] ---- @@ -37,4 +37,4 @@ Sample Config Data:: -- include::example$connectors/sources/{connectorType}/sample-data.adoc[] -- -==== \ No newline at end of file +====== \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/manage.adoc b/modules/pulsar-io/partials/connectors/sources/manage.adoc index 26ebe68..8f239d7 100644 --- a/modules/pulsar-io/partials/connectors/sources/manage.adoc +++ b/modules/pulsar-io/partials/connectors/sources/manage.adoc @@ -1,90 +1,247 @@ === Start + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq;admin-start] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] + +[source,shell] +---- +# Start all instances of a connector +./bin/pulsar-admin sources start \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only start an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq;curl-start] +include::partial$connectors/sources/curl-tab-prereq.adoc[] + +Start all instances of a connector: + +[source,shell] +---- +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/start" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- + +Start an individual instance of a connector: + +[source,shell] +---- +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/start" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Stop + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq;admin-stop] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] + +[source,shell] +---- +# Stop all instances of a connector +./bin/pulsar-admin sources stop \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only stop an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq;curl-stop] +include::partial$connectors/sources/curl-tab-prereq.adoc[] + +Stop all instances of a connector: + +[source,shell] +---- +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/stop" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- + +Stop an individual instance of a connector: + +[source,shell] +---- +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/stop" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Restart + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq;admin-restart] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] + +[source,shell,subs="attributes+"] +---- +# Restart all instances of a connector +./bin/pulsar-admin sources restart \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" + +# optionally add --instance-id to only restart an individual instance +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq;curl-restart] +include::partial$connectors/sources/curl-tab-prereq.adoc[] + +[source,shell] +---- +# Restart all instances of a connector +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/restart" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" + +# Restart an individual instance of a connector +curl -sS --fail -X POST "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/restart" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== +====== === Update [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- include::example$connectors/sources/{connectorType}/pulsar-admin-update.sh[] ---- -- -cURL:: +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq] +include::partial$connectors/sources/curl-tab-prereq.adoc[] + [source,shell,subs="attributes+"] ---- include::example$connectors/sources/{connectorType}/curl-update.sh[] ---- --- -Response:: -+ --- -include::partial$connectors/sources/curl-monitor-response.adoc[] --- +.Result +[%collapsible] +==== +[source,json] +---- + { + "tenant": "string", + "namespace": "string", + "name": "string", + "className": "string", + "topicName": "string", + "producerConfig": { + "maxPendingMessages": 0, + "maxPendingMessagesAcrossPartitions": 0, + "useThreadLocalProducers": true, + "cryptoConfig": { + "cryptoKeyReaderClassName": "string", + "cryptoKeyReaderConfig": { + "property1": {}, + "property2": {} + }, + "encryptionKeys": [ + "string" + ], + "producerCryptoFailureAction": "FAIL", + "consumerCryptoFailureAction": "FAIL" + }, + "batchBuilder": "string" + }, + "serdeClassName": "string", + "schemaType": "string", + "configs": { + "property1": {}, + "property2": {} + }, + "secrets": { + "property1": {}, + "property2": {} + }, + "parallelism": 0, + "processingGuarantees": "ATLEAST_ONCE", + "resources": { + "cpu": 0, + "ram": 0, + "disk": 0 + }, + "archive": "string", + "runtimeFlags": "string", + "customRuntimeOptions": "string", + "batchSourceConfig": { + "discoveryTriggererClassName": "string", + "discoveryTriggererConfig": { + "property1": {}, + "property2": {} + } + }, + "batchBuilder": "string" +} +---- ==== +-- +====== === Delete + [tabs] -==== +====== {pulsar-short} Admin:: + -- -include::partial$connectors/sources/pulsar-admin-tab.adoc[tags=admin-prereq;admin-delete] +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] + +[source,shell] +---- +# Delete all instances of a connector +./bin/pulsar-admin sources delete \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" +---- -- -cURL:: + +curl:: + -- -include::partial$connectors/sources/curl-tab.adoc[tags=curl-prereq;curl-delete] +include::partial$connectors/sources/curl-tab-prereq.adoc[] + +[source,shell] +---- +# Delete all instances of a connector +curl -sS --fail -X DELETE "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- -- -==== \ No newline at end of file +====== \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/monitoring.adoc b/modules/pulsar-io/partials/connectors/sources/monitoring.adoc index 5b893b5..c9e4d67 100644 --- a/modules/pulsar-io/partials/connectors/sources/monitoring.adoc +++ b/modules/pulsar-io/partials/connectors/sources/monitoring.adoc @@ -1,19 +1,23 @@ === Info [tabs] -==== +====== {pulsar-short} Admin:: + -- -Assuming you have downloaded client.conf to the {pulsar-short} folder: +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] -[source,shell,subs="attributes+"] +[source,shell] ---- -include::example$connectors/sources/pulsar-admin-info.sh[] +# Get information about connector +./bin/pulsar-admin sources get \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" ---- -- -cURL:: +curl:: + -- You need a {pulsar-short} token for REST API authentication. @@ -37,11 +41,14 @@ export WEB_SERVICE_URL= export ASTRA_STREAMING_TOKEN= ---- -. Use these values to form curl commands to the REST API: +. Use these values to form curl commands to the REST API, for example: + [source,shell,subs="attributes+"] ---- -include::example$connectors/sources/curl-info.sh[] +# Get a connector's information +curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME" \ + -H "accept: application/json" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" ---- -- @@ -50,24 +57,29 @@ Sample Config Data:: -- include::example$connectors/sources/{connectorType}/sample-data.adoc[] -- -==== +====== === Health [tabs] -==== +====== {pulsar-short} Admin:: + -- -Assuming you have downloaded the client.conf to the pulsar folder. +include::partial$connectors/sources/pulsar-admin-tab-prereq.adoc[] -[source,shell,subs="attributes+"] +[source,shell] ---- -include::example$connectors/sources/pulsar-admin-status.sh[] +# Check connector status +./bin/pulsar-admin sources status \ + --instance-id "$SOURCE_INSTANCEID" \ + --namespace "$NAMESPACE" \ + --name "$SOURCE_NAME" \ + --tenant "$TENANT" ---- -- -cURL:: +curl:: + -- You need a {pulsar-short} token for REST API authentication. @@ -91,20 +103,93 @@ export WEB_SERVICE_URL= export ASTRA_STREAMING_TOKEN= ---- -. Use these values to form curl commands to the REST API: +. Use these values to form curl commands to the REST API, for example: + [source,shell,subs="attributes+"] ---- -include::example$connectors/sources/curl-status.sh[] +# Get the status of all connector instances +curl -sS --fail "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/status" \ + -H "accept: application/json" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" + +# Get the status of an individual connector instance +curl "$WEB_SERVICE_URL/admin/v3/sources/$TENANT/$NAMESPACE/$SOURCE_NAME/$SOURCE_INSTANCEID/status" \ + -H "accept: application/json" \ + -H "Authorization: $ASTRA_STREAMING_TOKEN" +---- + +.Result +[%collapsible] +==== +Status response for all connector instances: + +[source,json] +---- +{ + "numInstances": 0, + "numRunning": 0, + "instances": [ + { + "instanceId": 0, + "status": { + "running": true, + "error": "string", + "numRestarts": 0, + "numReceivedFromSource": 0, + "numSystemExceptions": 0, + "latestSystemExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numSourceExceptions": 0, + "latestSourceExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numWritten": 0, + "lastReceivedTime": 0, + "workerId": "string" + } + } + ] +} +---- + +Status response for individual connector instance: + +[source,json] +---- +{ + "running": true, + "error": "string", + "numRestarts": 0, + "numReceivedFromSource": 0, + "numSystemExceptions": 0, + "latestSystemExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numSourceExceptions": 0, + "latestSourceExceptions": [ + { + "exceptionString": "string", + "timestampMs": 0 + } + ], + "numWritten": 0, + "lastReceivedTime": 0, + "workerId": "string" +} ---- --- -+ -Response:: -+ --- -include::partial$connectors/sources/curl-status-response.adoc[] --- ==== +-- +====== === Metrics diff --git a/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab-prereq.adoc b/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab-prereq.adoc new file mode 100644 index 0000000..9687e7c --- /dev/null +++ b/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab-prereq.adoc @@ -0,0 +1,3 @@ +Refer to the complete https://pulsar.apache.org/reference/#/{pulsar-version}.x/pulsar-admin/sources[pulsar-admin sources spec] for all available options. + +Assuming you have downloaded `client.conf` to the `{pulsar-short}` folder: \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab.adoc b/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab.adoc deleted file mode 100644 index 574b6b1..0000000 --- a/modules/pulsar-io/partials/connectors/sources/pulsar-admin-tab.adoc +++ /dev/null @@ -1,43 +0,0 @@ -// TODO: include an explanation or link to retrieving the client.conf from AS UI - -// tag::admin-prereq[] -Refer to the complete https://pulsar.apache.org/tools/pulsar-admin/{pulsar-version}.0-SNAPSHOT/#sources[pulsar-admin sources spec] for all available options. - -Assuming you have downloaded client.conf to the {pulsar-short} folder: -// end::admin-prereq[] -// tag::admin-start[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-start.sh[] ----- -// end::admin-start[] -// tag::admin-stop[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-stop.sh[] ----- -// end::admin-stop[] -// tag::admin-restart[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-restart.sh[] ----- -// end::admin-restart[] -// tag::admin-delete[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-delete.sh[] ----- -// end::admin-delete[] -// tag::admin-info[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-info.sh[] ----- -// end::admin-info[] -// tag::admin-status[] -[source,shell,subs="attributes+"] ----- -include::example$connectors/sources/pulsar-admin-status.sh[] ----- -// end::admin-status[] \ No newline at end of file diff --git a/modules/pulsar-io/partials/connectors/sources/source-connectors.adoc b/modules/pulsar-io/partials/connectors/sources/source-connectors.adoc deleted file mode 100644 index 93eeb4e..0000000 --- a/modules/pulsar-io/partials/connectors/sources/source-connectors.adoc +++ /dev/null @@ -1,95 +0,0 @@ -// tag::production[] -[#datagenerator-source] -=== Data Generator source - -The Data generator source connector produces messages for testing and persists the messages to {pulsar-short} topics. - -xref:connectors/sources/data-generator.adoc[Data Generator source documentation] - -[#debezium-mongodb-source] -=== Debezium MongoDB source - -The Debezium MongoDB source connector reads data from Debezium MongoDB systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/debezium-mongodb.adoc[Debezium MongoDB source documentation] - -[#debezium-mysql-source] -=== Debezium MySQL source - -The Debezium MySQL source connector reads data from Debezium MySQL systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/debezium-mysql.adoc[Debezium MySQL source documentation] - -[#debezium-oracle-source] -=== Debezium Oracle source - -The Debezium Oracle source connector reads data from Debezium Oracle systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/debezium-oracle.adoc[Debezium Oracle source documentation] - -[#debezium-postgres-source] -=== Debezium Postgres source - -The Debezium PostgreSQL source connector reads data from Debezium PostgreSQL systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/debezium-postgres.adoc[Debezium PostgreSQL source documentation] - -[#debezium-sql-server-source] -=== Debezium SQL Server source - -The Debezium SQL Server source connector reads data from Debezium SQL Server systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/debezium-sqlserver.adoc[Debezium SQL Server source documentation] - -[#kafka-source] -=== Kafka source - -The Kafka source connector reads data from Kafka systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/kafka.adoc[Kafka source connector documentation] - -[#kinesis-source] -=== AWS Kinesis source - -The AWS Kinesis source connector reads data from Kinesis systems and produces data to {pulsar-short} topics. - -xref:connectors/sources/kinesis.adoc[Kinesis source connector documentation] - -// end::production[] -// tag::source-experimental[] -{cass-short} Source + -Kinetica + -Azure DocumentDB + -Batch Data Generator + -Big Query + -canal + -CoAP + -Couchbase + -datadog + -diffusion + -DynamoDB + -file + -flume + -Apache Geode + -Hazelcast + -Humio + -JMS + -Apache Kudu + -MarkLogic + -MongoDB + -MQTT + -Neo4J + -New Relic + -NSQ + -OrientDB + -Apache Phoenix + -PLC4X + -RabbitMQ + -Redis + -SAP HANA + -SingleStore + -Splunk + -Twitter + -XTDB + -Zeebe + -// end::source-experimental[] \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc b/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc index ccf8783..a9dba46 100644 --- a/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc +++ b/modules/use-cases-architectures/pages/change-data-capture/consuming-change-data.adoc @@ -1,6 +1,7 @@ = Consuming change data with {pulsar-reg} :navtitle: Consuming change data :description: This article describes how to consume change data with {pulsar-reg}. +:csharp: C# [NOTE] ==== @@ -13,46 +14,25 @@ Each client handles message consumption a little differently but there is one ov Below are example implementations for each runtime consuming messages from the CDC data topic. -While these examples are in the “astra-streaming-examples” repository, they are not {product}-specific. You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. +While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific. +You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. -[cols="^1,^1,^1,^1,^1", grid=none,frame=none] -|=== -| https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[image:csharp-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[C#] -| https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[image:golang-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang] -| https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[image:java-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[Java] -| https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[image:node-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[Node.js] -| https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[image:python-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[Python] -|=== +* image:csharp-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/csharp/astra-cdc/Program.cs[{csharp} CDC project example] +* image:golang-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example] +* image:java-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/consumers/CDCConsumer.java[Java CDC consumer example] +* image:node-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/nodejs/astra-cdc/consumer.js[Node.js CDC consumer example] +* image:python-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/python/astra-cdc/cdc_consumer.py[Python CDC consumer example] == {pulsar-short} functions It is very common to have a function consuming the CDC data. Functions usually perform additional processing on the data and pass it to another topic. Similar to a client consumer, it will need to deserialize the message data. Below are examples of different functions consuming messages from the CDC data topic. -While these examples are in the “astra-streaming-examples” repository, they are not {product}-specific. You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. - -[cols="^1,^1,^1", grid=none,frame=none] -|=== -| https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[image:golang-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang] -| https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[image:java-icon.png[]] - -https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[Java] -| https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[image:python-icon.png[]] +While these examples are in the `astra-streaming-examples` repository, they are not {product}-specific. You can use these examples to consume CDC data topics in your own {cass-short}/{pulsar-short} clusters. -https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[Python] -|=== +* image:golang-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/go/astra-cdc/main/main.go[Golang CDC project example] +* image:java-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/java/astra-cdc/javaexamples/functions/CDCFunction.java[Java CDC function example] +* image:python-icon.png[] https://github.com/datastax/astra-streaming-examples/blob/master/python/cdc-in-pulsar-function/deschemaer.py[Python CDC function example] -== Next +== See also -You're ready to tackle CDC like a pro! Use our xref:use-cases-architectures:change-data-capture/questions-and-patterns.adoc[] as reference as you near production. \ No newline at end of file +* xref:use-cases-architectures:change-data-capture/questions-and-patterns.adoc[] \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/starlight/jms/index.adoc b/modules/use-cases-architectures/pages/starlight/jms/index.adoc index 3d28594..e925b54 100644 --- a/modules/use-cases-architectures/pages/starlight/jms/index.adoc +++ b/modules/use-cases-architectures/pages/starlight/jms/index.adoc @@ -52,13 +52,19 @@ image:pulsar-client-settings.png[] This example uses Maven for the project structure. If you prefer Gradle or another tool, this code should still be a good fit. -TIP: Visit our https://github.com/datastax/astra-streaming-examples[examples repo] to see the complete source of this example. +For complete source code examples, see the https://github.com/datastax/astra-streaming-examples[{product} examples repository]. . Create a new Maven project. + [source,shell] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-jms/create-project.sh[] +mvn archetype:generate \ + -DgroupId=org.example \ + -DartifactId=StarlightForJMSClient \ + -DarchetypeArtifactId=maven-archetype-quickstart \ + -DinteractiveMode=false + +cd StarlightForJMSClient ---- . Open the new project in your favorite IDE or text editor and add the jms dependency to "pom.xml". @@ -72,51 +78,97 @@ include::{astra-streaming-examples-repo}/java/starlight-for-jms/create-project.s ---- -. Open the file "src/main/java/org/example/App.java" and replace the entire contents with the below code. Notice there are class variables that need replacing. Apply the values previously retrieved in {product}. +. Open the file `src/main/java/org/example/App.java`, and then enter the following contents. +If you cloned the example repository, replace the entire contents of the file with the following code. +Your editor will report an error because this isn't a complete script yet. ++ +Replace placeholders with the values you previously retrieved from {product}. + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-jms/StarlightForJMSClient/src/main/java/org/example/App.java[tag=init-app] +package org.example; + +import com.datastax.oss.pulsar.jms.PulsarConnectionFactory; + +import javax.jms.JMSContext; +import javax.jms.Message; +import javax.jms.MessageListener; +import javax.jms.Queue; +import java.util.HashMap; +import java.util.Map; + +public class App +{ + private static String webServiceUrl = ""; + private static String brokerServiceUrl = ""; + private static String pulsarToken = ""; + private static String tenantName = ""; + private static final String namespace = ""; + private static final String topicName = ""; + private static final String topic = String.format("persistent://%s/%s/%s", tenantName,namespace,topicName); + public static void main( String[] args ) throws Exception + { ---- -+ -NOTE: Don't worry if your editor shows errors, this isn't a complete program... yet. -. Add the following code to build the configuration that will be used by both the producer and consumer. +. Add the following code to build the configuration that will be used by both the producer and consumer: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-jms/StarlightForJMSClient/src/main/java/org/example/App.java[tag=build-config] + Map properties = new HashMap<>(); + properties.put("webServiceUrl",webServiceUrl); + properties.put("brokerServiceUrl",brokerServiceUrl); + properties.put("authPlugin","org.apache.pulsar.client.impl.auth.AuthenticationToken"); + properties.put("authParams",pulsarToken); ---- -. Add the following code into the file. -This is a very simple 'PulsarConnectionFactory' that first creates a JMS queue using the full {pulsar-short} topic address, then creates a message listener callback function that watches the queue. -Finally, it produces a single message on the queue. +. Add the following code that defines a simple 'PulsarConnectionFactory' that creates a JMS queue using the full {pulsar-short} topic address, then creates a message listener callback function that watches the queue, and then produces a single message on the queue. + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-jms/StarlightForJMSClient/src/main/java/org/example/App.java[tag=build-factory] + try (PulsarConnectionFactory factory = new PulsarConnectionFactory(properties); ){ + JMSContext context = factory.createContext(); + Queue queue = context.createQueue(topic); + + context.createConsumer(queue).setMessageListener(new MessageListener() { + @Override + public void onMessage(Message message) { + try { + System.out.println("Received: " + message.getBody(String.class)); + } catch (Exception err) { + err.printStackTrace(); + } + } + }); + + String message = "Hello there!"; + System.out.println("Sending: "+message); + context.createProducer().send(queue, message); + + Thread.sleep(4000); //wait for the message to be consumed + } + } +} ---- -. You now have a complete program, so let's see it in action! Build and run the jar with the following terminal commands. +. Build and run a JAR file for this program: + [source,shell] ---- mvn clean package assembly:single java -jar target/StarlightForJMSClient-1.0-SNAPSHOT-jar-with-dependencies.jar ---- - -. If all goes as it should, your output will be similar to this: + -[source,shell] +.Result +[%collapsible] +==== +[source,console] ---- Sending: Hello there! Received: Hello there! ---- +==== -See how easy that was? You're already an app modernization ninja! + -Keep building those skills with the guides in the next section. - -== What's next? +== Next steps * xref:starlight-for-jms:examples:pulsar-jms-implementation.adoc[] * xref:starlight-for-jms:reference:pulsar-jms-mappings.adoc[] diff --git a/modules/use-cases-architectures/pages/starlight/kafka/index.adoc b/modules/use-cases-architectures/pages/starlight/kafka/index.adoc index 4d8ea97..e7a18cd 100644 --- a/modules/use-cases-architectures/pages/starlight/kafka/index.adoc +++ b/modules/use-cases-architectures/pages/starlight/kafka/index.adoc @@ -71,7 +71,7 @@ TIP: Click the clipboard icon to copy the Kafka connection values, as well as a === Produce and consume a message [tabs] -==== +====== Kafka CLI:: + -- @@ -119,25 +119,30 @@ This is my first S4K message. ---- . Press 'Ctrl-C' to exit the consumer shell. - -Wow, you did it! A Kafka producer and consumer with a {pulsar-short} cluster. How about trying the Java client now? -- + Kafka Client (Java):: + -- This example uses Maven for the project structure. If you prefer Gradle or another tool, this code should still be a good fit. -TIP: Visit our https://github.com/datastax/astra-streaming-examples[examples repo] to see the complete source for this example. +For complete source code examples, see the https://github.com/datastax/astra-streaming-examples[{product} examples repository]. . Create a new Maven project. + [source,shell] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/create-project.sh[] +mvn archetype:generate \ + -DgroupId=org.example \ + -DartifactId=StarlightForKafkaClient \ + -DarchetypeArtifactId=maven-archetype-quickstart \ + -DinteractiveMode=false + +cd StarlightForKafkaClient ---- -. Open the new project in your favorite IDE or text editor and add the Kafka client dependency to "pom.xml". +. Open the new project in your IDE or text editor, and then add the Kafka client dependency to `pom.xml`: + [source,xml] ---- @@ -148,46 +153,109 @@ include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/c ---- -. Open the file "src/main/java/org/example/App.java" and replace the entire contents with the below code. Notice there are class variables that need replacing. Apply the values previously retrieved in {product}. +. Open the file `src/main/java/org/example/App.java`, and then enter the following code. +If you cloned the example repo, replace the entire contents of `App.java` with the following code. +Your editor will report an error because this isn't a complete script yet. ++ +Replace placeholders with the values you previously retrieved from {product}. + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/StarlightForKafkaClient/src/main/java/org/example/App.java[tag=init-app] +package org.example; + +import org.apache.kafka.clients.consumer.ConsumerConfig; +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.clients.consumer.ConsumerRecords; +import org.apache.kafka.clients.consumer.KafkaConsumer; +import org.apache.kafka.clients.producer.*; +import org.apache.kafka.common.serialization.LongSerializer; +import org.apache.kafka.common.serialization.StringDeserializer; +import org.apache.kafka.common.serialization.StringSerializer; + +import java.time.Duration; +import java.util.Collections; +import java.util.Properties; + +public class App { + private static String bootstrapServers = ""; + private static String pulsarToken = ""; + private static String tenantName = ""; + private static final String namespace = ""; + private static final String topicName = ""; + private static final String topic = String.format("persistent://%s/%s/%s", tenantName,namespace,topicName); + + public static void main(String[] args) { ---- -+ -NOTE: Don't worry if your editor shows errors, this isn't a complete program... yet. -. Bring in the following code to build the configuration that will be used by both the producer and consumer. +. Add the following code that builds the configuration that will be used by both the producer and consumer: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/StarlightForKafkaClient/src/main/java/org/example/App.java[tag=build-config] + Properties config = new Properties(); + config.put("bootstrap.servers",bootstrapServers); + config.put("security.protocol","SASL_SSL"); + config.put("sasl.jaas.config", String.format("org.apache.kafka.common.security.plain.PlainLoginModule required username='%s' password='token:%s';", tenantName, pulsarToken)); + config.put("sasl.mechanism","PLAIN"); + config.put("session.timeout.ms","45000"); + config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName()); + config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); + config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); + config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); + config.put("group.id", "my-consumer-group"); ---- -. Now paste the producer code into the file. This is a very simple flow that sends a single message and awaits acknowledgment. +. Add the producer code, which is a simple flow that sends a single message and awaits acknowledgment: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/StarlightForKafkaClient/src/main/java/org/example/App.java[tag=build-producer] + KafkaProducer producer = new KafkaProducer<>(config); + + final ProducerRecord producerRecord = new ProducerRecord<>(topic, System.currentTimeMillis(), "Hello World"); + producer.send(producerRecord, new Callback() { + public void onCompletion(RecordMetadata metadata, Exception e) { + if (e != null) + System.out.println(String.format("Send failed for record, %s. \nRecord data: %s",e.getMessage(), producerRecord)); + else + System.out.println("Successfully sent message"); + } + }); + + producer.flush(); + producer.close(); ---- -. Paste the consumer code into the file. This creates a basic subscription and retrieves the latest messages on the topic. +. Add the consumer code, which creates a basic subscription and retrieves the latest messages on the topic: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-kafka/kafka-client/StarlightForKafkaClient/src/main/java/org/example/App.java[tag=build-consumer] + final KafkaConsumer consumer = new KafkaConsumer(config); + + consumer.subscribe(Collections.singletonList(topic)); + ConsumerRecords consumerRecords = consumer.poll(Duration.ofMillis(5000)); + + System.out.println(String.format("Found %d total record(s)", consumerRecords.count())); + + for (ConsumerRecord consumerRecord : consumerRecords) { + System.out.println(consumerRecord); + } + + consumer.commitSync(); + consumer.close(); + } +} ---- -. Now you should have a complete program. Let's see it in action! Build and run the jar with the following terminal commands. +. Build and run a JAR file for the complete program: + [source,shell] ---- mvn clean package assembly:single java -jar target/StarlightForKafkaClient-1.0-SNAPSHOT-jar-with-dependencies.jar ---- - -. If all goes as it should, your output will be similar to this: + +.Result +[%collapsible] +==== [source,shell] ---- Successfully sent message @@ -195,14 +263,13 @@ Successfully sent message Found 1 total record(s) ConsumerRecord(topic = persistent://my-tenant-007/my-namespace/my-topic, partition = 0, leaderEpoch = null, offset = 22, CreateTime = 1673545962124, serialized key size = 8, serialized value size = 11, headers = RecordHeaders(headers = [], isReadOnly = false), key = xxxxx, value = Hello World) ---- - -Congrats! You have just used the Kafka client to send and receive messages in {pulsar-short}. Next stop is the moon! --- ==== +-- +====== -The {kafka-for-astra} documentation provides more specifics about the below topics and more. Visit those for more detail. +== See also * xref:starlight-for-kafka:operations:starlight-kafka-kstreams.adoc[] * xref:starlight-for-kafka:operations:starlight-kafka-implementation.adoc[] -* xref:starlight-for-kafka:operations:starlight-kafka-monitor.adoc[Monitoring] +* xref:starlight-for-kafka:operations:starlight-kafka-monitor.adoc[] * xref:starlight-for-kafka:operations:starlight-kafka-security.adoc[] \ No newline at end of file diff --git a/modules/use-cases-architectures/pages/starlight/rabbitmq/index.adoc b/modules/use-cases-architectures/pages/starlight/rabbitmq/index.adoc index 29b8ba9..fa704ec 100644 --- a/modules/use-cases-architectures/pages/starlight/rabbitmq/index.adoc +++ b/modules/use-cases-architectures/pages/starlight/rabbitmq/index.adoc @@ -70,24 +70,25 @@ TIP: Click the clipboard icon to copy the RabbitMQ connection values, as well as === Produce and consume a message -[tabs] -==== -RabbitMQ Client (Java):: -+ --- -This example uses Maven for the project structure. +This example uses Maven for the project structure for a Rabbit MQ Java client. If you prefer Gradle or another tool, this code should still be a good fit. -TIP: Visit our https://github.com/datastax/astra-streaming-examples[examples repo] to see the complete source for this example. +For complete source code examples, see the https://github.com/datastax/astra-streaming-examples[{product} examples repository]. . Create a new Maven project. + [source,shell] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-client/create-project.sh[] +mvn archetype:generate \ + -DgroupId=org.example \ + -DartifactId=StarlightForRabbitMqClient \ + -DarchetypeArtifactId=maven-archetype-quickstart \ + -DinteractiveMode=false + +cd StarlightForRabbitMqClient ---- -. Open the new project in your favorite IDE or text editor and add the RabbitMQ client dependency to "pom.xml". +. Open the new project in your IDE or text editor, and then add the RabbitMQ client dependency to `pom.xml`: + [source,xml] ---- @@ -98,63 +99,108 @@ include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-cl ---- -. Open the file "src/main/java/org/example/App.java" and replace the entire contents with the below code. -Notice there are class variables that need replacing. -Apply the values previously retrieved in {product}. +. Open the file `src/main/java/org/example/App.java`, and then enter the following code. +If you cloned the example repo, replace the entire contents with the following code. +Your editor will report errors because this isn't a complete program yet. ++ +Replace placeholders with the values you previously retrieved from {product}. + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-client/StarlightForRabbitMqClient/src/main/java/org/example/App.java[tag=init-app] +package org.example; + +import com.rabbitmq.client.*; + +import java.io.IOException; +import java.net.URISyntaxException; +import java.nio.charset.StandardCharsets; +import java.security.KeyManagementException; +import java.security.NoSuchAlgorithmException; +import java.util.concurrent.TimeoutException; + +public class App { + private static final String username = ""; + private static final String password = ""; + private static final String host = ""; + private static final int port = 5671; + private static final String virtual_host = "/"; //The "rabbitmq" namespace should have been created when you enabled S4R + private static final String queueName = ""; //This will get created automatically + private static final String amqp_URI = String.format("amqps://%s:%s@%s:%d/%s", username, password, host, port, virtual_host.replace("/","%2f")); + + public static void main(String[] args) throws IOException, TimeoutException, URISyntaxException, NoSuchAlgorithmException, KeyManagementException, InterruptedException { ---- -+ -NOTE: Don't worry if your editor shows errors, this isn't a complete program... yet. -. Add the following code to create a valid connection, a channel, and a queue that will be used by both the producer and consumer. +. Add the code to create a connection, channel, and queue that will be used by both the producer and consumer: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-client/StarlightForRabbitMqClient/src/main/java/org/example/App.java[tag=create-queue] + ConnectionFactory factory = new ConnectionFactory(); + factory.setUri(amqp_URI); + + /* + You could also set each value individually + factory.setHost(host); + factory.setPort(port); + factory.setUsername(username); + factory.setPassword(password); + factory.setVirtualHost(virtual_host); + factory.useSslProtocol(); + */ + + Connection connection = factory.newConnection(); + Channel channel = connection.createChannel(); + + channel.queueDeclare(queueName, false, false, false, null); ---- -. Add the producer code to the file. -This is a very simple flow that sends a single message and awaits acknowledgment. +. Add the producer code, which is a simple flow that sends a single message and awaits acknowledgment: + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-client/StarlightForRabbitMqClient/src/main/java/org/example/App.java[tag=build-producer] + String publishMessage = "Hello World!"; + channel.basicPublish("", queueName, null, publishMessage.getBytes()); + System.out.println(" Sent '" + publishMessage + "'"); ---- -. Add the consumer code to the file. -This creates a basic consumer with callback on message receipt. -Because the consumer isn't a blocking thread, we'll sleep for a few seconds and let things process. +. Add the consumer code, which creates a basic consumer with callback on message receipt. +Because the consumer isn't a blocking thread, the `sleep` allows time for messages to be received and processed. + [source,java] ---- -include::{astra-streaming-examples-repo}/java/starlight-for-rabbitmq/rabbitmq-client/StarlightForRabbitMqClient/src/main/java/org/example/App.java[tag=build-consumer] + DeliverCallback deliverCallback = (consumerTag, delivery) -> { + String consumeMessage = new String(delivery.getBody(), StandardCharsets.UTF_8); + System.out.println(" Received '" + consumeMessage + "'"); + }; + + channel.basicConsume(queueName, true, deliverCallback, consumerTag -> { }); + + Thread.sleep(4000); // wait a bit for messages to be received + + channel.close(); + connection.close(); + } +} ---- -. You now have a complete program, so let's see it in action! -Build and run the jar with the following terminal commands. +. Build and run the JAR file for the complete program: + [source,shell] ---- mvn clean package assembly:single java -jar target/StarlightForRabbitMqClient-1.0-SNAPSHOT-jar-with-dependencies.jar ---- - -. If all goes as it should, your output will be similar to this: + +.Result +[%collapsible] +==== [source,shell] ---- Sent 'Hello World!' Received 'Hello World!' ---- - -Congrats! You used the RabbitMQ client to send and receive messages in {pulsar-short}. --- ==== -== What's next? +== Next steps * xref:starlight-for-rabbitmq:ROOT:index.adoc[{starlight-rabbitmq} documentation] * xref:luna-streaming:components:starlight-for-rabbitmq.adoc[] \ No newline at end of file