Represent the default dataset properly in the internal flow

### Request

The current CLP package flow doesn't handle the default dataset properly.
* In the compression end, we set the dataset to `"default"` if not set: 
  * https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/clp-package-utils/clp_package_utils/scripts/compress.py#L207
  * https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py#L262
* In the native compression script, it can take an optional dataset without further checking: https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/clp-package-utils/clp_package_utils/scripts/native/compress.py#L314
* In the compression job config, the dataset is also nullable:
  * https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/job-orchestration/job_orchestration/scheduler/job_config.py#L25
  * https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/job-orchestration/job_orchestration/scheduler/job_config.py#L35
* In the compression job executor, the dataset is nullable but it is not checked:
  * https://github.com/y-scope/clp/blob/bfc474f4938c533ea9447ac2e3ffc120459c5662/components/job-orchestration/job_orchestration/executor/compress/compression_task.py#L399

The flow works well if everything's submitted from the compression script end, as it ensures the dataset wil always be `default`. However, this doesn't work if we submit compression jobs directly to CLP DB.

We should make the dataset handling consistent with the config definition.

### Possible implementation

* Allow the dataset to be nullable.
* Don't set it to `default` in the compression script.
* Handle the dataset in the compression job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Represent the default dataset properly in the internal flow #1735

Request

Possible implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Represent the default dataset properly in the internal flow #1735

Description

Request

Possible implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions