Merge branch 'dev' into multi-table-sink-hbase

BruceWong96 · Aug 30, 2024 · 6f9b35b · 6f9b35b
2 parents f50654f + df210ea
commit 6f9b35b
Show file tree

Hide file tree

Showing 115 changed files with 5,028 additions and 415 deletions.
diff --git a/.github/workflows/labeler/label-scope-conf.yml b/.github/workflows/labeler/label-scope-conf.yml
@@ -257,6 +257,12 @@ activemq:
       - changed-files:
           - any-glob-to-any-file: seatunnel-connectors-v2/connector-activemq/**
           - all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(activemq)/**'
+typesense:
+  - all:
+      - changed-files:
+          - any-glob-to-any-file: seatunnel-connectors-v2/connector-typesense/**
+          - all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(typesense)/**'
+
 Zeta Rest API:
   - changed-files:
       - any-glob-to-any-file: seatunnel-engine/**/server/rest/**

diff --git a/config/plugin_config b/config/plugin_config
@@ -88,5 +88,5 @@ connector-web3j
 connector-milvus
 connector-activemq
 connector-sls
+connector-typesense
 connector-cdc-opengauss
---end--
diff --git a/docs/en/connector-v2/sink/Rabbitmq.md b/docs/en/connector-v2/sink/Rabbitmq.md
@@ -57,6 +57,21 @@ convenience method for setting the fields in an AMQP URI: host, port, username,
 
 the queue to write the message to
 
+### durable [boolean]
+
+true: The queue will survive a server restart.
+false: The queue will be deleted on server restart.
+
+### exclusive [boolean]
+
+true: The queue is used only by the current connection and will be deleted when the connection closes.
+false: The queue can be used by multiple connections.
+
+### auto_delete [boolean]
+
+true: The queue will be deleted automatically when the last consumer unsubscribes.
+false: The queue will not be automatically deleted.
+
 ### schema [Config]
 
 #### fields [Config]
@@ -112,6 +127,30 @@ sink {
 }
 ```
 
+### Example 2
+
+queue with durable, exclusive, auto_delete:
+
+```hocon
+sink {
+      RabbitMQ {
+          host = "rabbitmq-e2e"
+          port = 5672
+          virtual_host = "/"
+          username = "guest"
+          password = "guest"
+          queue_name = "test1"
+          durable = "true"
+          exclusive = "false"
+          auto_delete = "false"
+          rabbitmq.config = {
+            requested-heartbeat = 10
+            connection-timeout = 10
+          }
+      }
+}
+```
+
 ## Changelog
 
 ### next version

diff --git a/docs/en/connector-v2/sink/Typesense.md b/docs/en/connector-v2/sink/Typesense.md
@@ -0,0 +1,93 @@
+# Typesense
+
+## Description
+
+Outputs data to `Typesense`.
+
+## Key Features
+
+- [ ] [Exactly Once](../../concept/connector-v2-features.md)
+- [x] [CDC](../../concept/connector-v2-features.md)
+
+## Options
+
+|       Name       |  Type  | Required |        Default Value         |
+|------------------|--------|----------|------------------------------|
+| hosts            | array  | Yes      | -                            |
+| collection       | string | Yes      | -                            |
+| schema_save_mode | string | Yes      | CREATE_SCHEMA_WHEN_NOT_EXIST |
+| data_save_mode   | string | Yes      | APPEND_DATA                  |
+| primary_keys     | array  | No       |                              |
+| key_delimiter    | string | No       | `_`                          |
+| api_key          | string | No       |                              |
+| max_retry_count  | int    | No       | 3                            |
+| max_batch_size   | int    | No       | 10                           |
+| common-options   |        | No       | -                            |
+
+### hosts [array]
+
+The access address for Typesense, formatted as `host:port`, e.g., `["typesense-01:8108"]`.
+
+### collection [string]
+
+The name of the collection to write to, e.g., "seatunnel".
+
+### primary_keys [array]
+
+Primary key fields used to generate the document `id`.
+
+### key_delimiter [string]
+
+Sets the delimiter for composite keys (default is `_`).
+
+### api_key [config]
+
+The `api_key` for secure access to Typesense.
+
+### max_retry_count [int]
+
+The maximum number of retry attempts for batch requests.
+
+### max_batch_size [int]
+
+The maximum size of document batches.
+
+### common options
+
+Common parameters for Sink plugins. Refer to [Common Sink Options](../source-common-options.md) for more details.
+
+### schema_save_mode
+
+Choose how to handle the target-side schema before starting the synchronization task:
+- `RECREATE_SCHEMA`: Creates the table if it doesn’t exist, and deletes and recreates it if it does.
+- `CREATE_SCHEMA_WHEN_NOT_EXIST`: Creates the table if it doesn’t exist, skips creation if it does.
+- `ERROR_WHEN_SCHEMA_NOT_EXIST`: Throws an error if the table doesn’t exist.
+
+### data_save_mode
+
+Choose how to handle existing data on the target side before starting the synchronization task:
+- `DROP_DATA`: Retains the database structure but deletes the data.
+- `APPEND_DATA`: Retains both the database structure and the data.
+- `ERROR_WHEN_DATA_EXISTS`: Throws an error if data exists.
+
+## Example
+
+Simple example:
+
+```bash
+sink {
+    Typesense {
+        source_table_name = "typesense_test_table"
+        hosts = ["localhost:8108"]
+        collection = "typesense_to_typesense_sink_with_query"
+        max_retry_count = 3
+        max_batch_size = 10
+        api_key = "xyz"
+        primary_keys = ["num_employees","id"]
+        key_delimiter = "="
+        schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
+        data_save_mode = "APPEND_DATA"
+    }
+}
+```
+
diff --git a/docs/en/connector-v2/source/Jdbc.md b/docs/en/connector-v2/source/Jdbc.md
@@ -39,7 +39,7 @@ supports query SQL and can achieve projection effect.
 
 ## Options
 
-|                    name                    |  type   | required |  default value  |                                                                                                                                                                                                                                                                                                                    description                                                                                                                                                                                                                                                                                                                     |
+| name                                       | type    | required | default value   | description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 |--------------------------------------------|---------|----------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | url                                        | String  | Yes      | -               | The URL of the JDBC connection. Refer to a case: jdbc:postgresql://localhost/test                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | driver                                     | String  | Yes      | -               | The jdbc class name used to connect to the remote data source, if you use MySQL the value is `com.mysql.cj.jdbc.Driver`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
@@ -52,6 +52,7 @@ supports query SQL and can achieve projection effect.
 | partition_upper_bound                      | Long    | No       | -               | The partition_column max value for scan, if not set SeaTunnel will query database get max value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | partition_lower_bound                      | Long    | No       | -               | The partition_column min value for scan, if not set SeaTunnel will query database get min value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | partition_num                              | Int     | No       | job parallelism | Not recommended for use, The correct approach is to control the number of split through `split.size`<br/> How many splits do we need to split into, only support positive integer. default value is job parallelism.                                                                                                                                                                                                                                                                                                                                                                                                                               |
+| decimal_type_narrowing                     | Boolean | No       | true            | Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now. Please refer to `decimal_type_narrowing` below                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
 | use_select_count                           | Boolean | No       | false           | Use select count for table count rather then other methods in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table                                                                                                                                                                                                                                                                                                                                                                                     |
 | skip_analyze                               | Boolean | No       | false           | Skip the analysis of table count in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently                                                                                                                                                                                                                                                                                                                                                                                    |
 | fetch_size                                 | Int     | No       | 0               | For queries that return a large number of objects, you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value.                                                                                                                                                                                                                                                                                                                                                                                               |
@@ -66,6 +67,28 @@ supports query SQL and can achieve projection effect.
 | split.inverse-sampling.rate                | Int     | No       | 1000            | The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000.                                                                                                                                                                                            |
 | common-options                             |         | No       | -               | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
 
+### decimal_type_narrowing
+
+Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now.
+
+eg:
+
+decimal_type_narrowing = true
+
+| Oracle        | SeaTunnel |
+|---------------|-----------|
+| NUMBER(1, 0)  | Boolean   |
+| NUMBER(6, 0)  | INT       |
+| NUMBER(10, 0) | BIGINT    |
+
+decimal_type_narrowing = false
+
+| Oracle        | SeaTunnel      |
+|---------------|----------------|
+| NUMBER(1, 0)  | Decimal(1, 0)  |
+| NUMBER(6, 0)  | Decimal(6, 0)  |
+| NUMBER(10, 0) | Decimal(10, 0) |
+
 ## Parallel Reader
 
 The JDBC Source connector supports parallel reading of data from tables. SeaTunnel will use certain rules to split the data in the table, which will be handed over to readers for reading. The number of readers is determined by the `parallelism` option.