Skip to content

Commit

Permalink
Merge branch 'dev' into multi-table-sink-hbase
Browse files Browse the repository at this point in the history
  • Loading branch information
BruceWong96 committed Aug 30, 2024
2 parents f50654f + df210ea commit 6f9b35b
Show file tree
Hide file tree
Showing 115 changed files with 5,028 additions and 415 deletions.
6 changes: 6 additions & 0 deletions .github/workflows/labeler/label-scope-conf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,12 @@ activemq:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-activemq/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(activemq)/**'
typesense:
- all:
- changed-files:
- any-glob-to-any-file: seatunnel-connectors-v2/connector-typesense/**
- all-globs-to-all-files: '!seatunnel-connectors-v2/connector-!(typesense)/**'

Zeta Rest API:
- changed-files:
- any-glob-to-any-file: seatunnel-engine/**/server/rest/**
Expand Down
2 changes: 1 addition & 1 deletion config/plugin_config
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,5 @@ connector-web3j
connector-milvus
connector-activemq
connector-sls
connector-typesense
connector-cdc-opengauss
--end--
39 changes: 39 additions & 0 deletions docs/en/connector-v2/sink/Rabbitmq.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,21 @@ convenience method for setting the fields in an AMQP URI: host, port, username,

the queue to write the message to

### durable [boolean]

true: The queue will survive a server restart.
false: The queue will be deleted on server restart.

### exclusive [boolean]

true: The queue is used only by the current connection and will be deleted when the connection closes.
false: The queue can be used by multiple connections.

### auto_delete [boolean]

true: The queue will be deleted automatically when the last consumer unsubscribes.
false: The queue will not be automatically deleted.

### schema [Config]

#### fields [Config]
Expand Down Expand Up @@ -112,6 +127,30 @@ sink {
}
```

### Example 2

queue with durable, exclusive, auto_delete:

```hocon
sink {
RabbitMQ {
host = "rabbitmq-e2e"
port = 5672
virtual_host = "/"
username = "guest"
password = "guest"
queue_name = "test1"
durable = "true"
exclusive = "false"
auto_delete = "false"
rabbitmq.config = {
requested-heartbeat = 10
connection-timeout = 10
}
}
}
```

## Changelog

### next version
Expand Down
93 changes: 93 additions & 0 deletions docs/en/connector-v2/sink/Typesense.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Typesense

## Description

Outputs data to `Typesense`.

## Key Features

- [ ] [Exactly Once](../../concept/connector-v2-features.md)
- [x] [CDC](../../concept/connector-v2-features.md)

## Options

| Name | Type | Required | Default Value |
|------------------|--------|----------|------------------------------|
| hosts | array | Yes | - |
| collection | string | Yes | - |
| schema_save_mode | string | Yes | CREATE_SCHEMA_WHEN_NOT_EXIST |
| data_save_mode | string | Yes | APPEND_DATA |
| primary_keys | array | No | |
| key_delimiter | string | No | `_` |
| api_key | string | No | |
| max_retry_count | int | No | 3 |
| max_batch_size | int | No | 10 |
| common-options | | No | - |

### hosts [array]

The access address for Typesense, formatted as `host:port`, e.g., `["typesense-01:8108"]`.

### collection [string]

The name of the collection to write to, e.g., "seatunnel".

### primary_keys [array]

Primary key fields used to generate the document `id`.

### key_delimiter [string]

Sets the delimiter for composite keys (default is `_`).

### api_key [config]

The `api_key` for secure access to Typesense.

### max_retry_count [int]

The maximum number of retry attempts for batch requests.

### max_batch_size [int]

The maximum size of document batches.

### common options

Common parameters for Sink plugins. Refer to [Common Sink Options](../source-common-options.md) for more details.

### schema_save_mode

Choose how to handle the target-side schema before starting the synchronization task:
- `RECREATE_SCHEMA`: Creates the table if it doesn’t exist, and deletes and recreates it if it does.
- `CREATE_SCHEMA_WHEN_NOT_EXIST`: Creates the table if it doesn’t exist, skips creation if it does.
- `ERROR_WHEN_SCHEMA_NOT_EXIST`: Throws an error if the table doesn’t exist.

### data_save_mode

Choose how to handle existing data on the target side before starting the synchronization task:
- `DROP_DATA`: Retains the database structure but deletes the data.
- `APPEND_DATA`: Retains both the database structure and the data.
- `ERROR_WHEN_DATA_EXISTS`: Throws an error if data exists.

## Example

Simple example:

```bash
sink {
Typesense {
source_table_name = "typesense_test_table"
hosts = ["localhost:8108"]
collection = "typesense_to_typesense_sink_with_query"
max_retry_count = 3
max_batch_size = 10
api_key = "xyz"
primary_keys = ["num_employees","id"]
key_delimiter = "="
schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
data_save_mode = "APPEND_DATA"
}
}
```

25 changes: 24 additions & 1 deletion docs/en/connector-v2/source/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ supports query SQL and can achieve projection effect.

## Options

| name | type | required | default value | description |
| name | type | required | default value | description |
|--------------------------------------------|---------|----------|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| url | String | Yes | - | The URL of the JDBC connection. Refer to a case: jdbc:postgresql://localhost/test |
| driver | String | Yes | - | The jdbc class name used to connect to the remote data source, if you use MySQL the value is `com.mysql.cj.jdbc.Driver`. |
Expand All @@ -52,6 +52,7 @@ supports query SQL and can achieve projection effect.
| partition_upper_bound | Long | No | - | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
| partition_lower_bound | Long | No | - | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
| partition_num | Int | No | job parallelism | Not recommended for use, The correct approach is to control the number of split through `split.size`<br/> How many splits do we need to split into, only support positive integer. default value is job parallelism. |
| decimal_type_narrowing | Boolean | No | true | Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now. Please refer to `decimal_type_narrowing` below |
| use_select_count | Boolean | No | false | Use select count for table count rather then other methods in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table |
| skip_analyze | Boolean | No | false | Skip the analysis of table count in dynamic chunk split stage. This is currently only available for jdbc-oracle.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently |
| fetch_size | Int | No | 0 | For queries that return a large number of objects, you can configure the row fetch size used in the query to improve performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value. |
Expand All @@ -66,6 +67,28 @@ supports query SQL and can achieve projection effect.
| split.inverse-sampling.rate | Int | No | 1000 | The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000. |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |

### decimal_type_narrowing

Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now.

eg:

decimal_type_narrowing = true

| Oracle | SeaTunnel |
|---------------|-----------|
| NUMBER(1, 0) | Boolean |
| NUMBER(6, 0) | INT |
| NUMBER(10, 0) | BIGINT |

decimal_type_narrowing = false

| Oracle | SeaTunnel |
|---------------|----------------|
| NUMBER(1, 0) | Decimal(1, 0) |
| NUMBER(6, 0) | Decimal(6, 0) |
| NUMBER(10, 0) | Decimal(10, 0) |

## Parallel Reader

The JDBC Source connector supports parallel reading of data from tables. SeaTunnel will use certain rules to split the data in the table, which will be handed over to readers for reading. The number of readers is determined by the `parallelism` option.
Expand Down
Loading

0 comments on commit 6f9b35b

Please sign in to comment.