Skip to content

[Feature][tools] X2SeaTunnel #9507

@Adamyuanyuan

Description

@Adamyuanyuan

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

X2SeaTunnel Design Document

Background Overview

X2SeaTunnel is a universal configuration conversion tool designed to convert configuration files from various data integration tools (such as DataX, Sqoop, etc.) into SeaTunnel's HOCON or JSON configuration files, helping users migrate smoothly to the SeaTunnel platform.

Design Philosophy

Core Principles

  • Simple and Lightweight: Keep the tool efficient and focused on configuration file format conversion
  • Unified Framework: Build a common framework that supports configuration conversion from multiple data integration tools
  • Extensibility: Adopt a plugin-based design for easy extension to support more tools in the future
  • Usability: Provide multiple usage methods, offering SDK, command-line interface, supporting both single script and batch processing to meet different scenario requirements
    Image

Conversion Process

Source Tool Config(DataX json) → Parsing → Unified Model → Mapping Conversion → Generate SeaTunnel Config

Usage Methods

Simple Command Line Interface

# Basic usage
sh bin/x2seatunnel.sh -t datax -i /path/to/config.json -o /path/to/output.conf

# Specify tool type, input/output and format
sh bin/x2seatunnel.sh -t datax -i input.json -o output.conf -f hocon

# Batch conversion
sh bin/x2seatunnel.sh -t datax -d /input/dir/ -o /output/dir/

YAML Command Line Interface

# Using YAML configuration file
sh bin/x2seatunnel.sh --config conversion.yaml

YAML Configuration File Example

# X2SeaTunnel configuration file
metadata:
  # Configuration file format version
  configVersion: "1.0"
  # Description (optional)
  description: "DataX to SeaTunnel conversion configuration"

# Tool configuration
tool:
  # Source tool type: datax, sqoop, etc.
  sourceType: "datax"
  sourceVersion: "2.1.2"
  # Target SeaTunnel version
  targetVersion: "2.3.11"

# Input configuration
input:
  # Source configuration path (file or directory)
  path: "/path/to/configs"
  # Whether to process subdirectories recursively
  recursive: true
  # File matching pattern
  pattern: "*.json"

# Output configuration
output:
  # Output path
  path: "/path/to/output"
  # Output format: hocon or json
  format: "hocon" 
  # Filename conversion rule
  namePattern: "${filename}_seatunnel.conf"

# Mapping configuration
mapping:
  # Custom mapping rules path (optional)
  rulesPath: "/path/to/custom/rules.json"
  
# Validation configuration
validation:
  # Whether to enable validation
  enabled: true
  # Validation failure behavior: warn, error, ignore
  
# Logging configuration
logging:
  # Log level: debug, info, warn, error
  level: "info"
  # Log output path
  path: "./logs"
  # Log filename pattern
  filePattern: "x2seatunnel-%d{yyyy-MM-dd}.log"
  # Whether to output to console simultaneously
  console: true

SDK Integration Method

// Create specific tool converter
X2SeaTunnelConverter converter = X2SeaTunnelFactory.createConverter("datax");

// Configure conversion options
ConversionOptions options = new ConversionOptions.Builder()
    .outputFormat("hocon")
    .targetVersion("2.3.11")
    .build();
    
// Execute conversion
String seatunnelConfig = converter.convert(sourceConfigContent, options);

Implementation Roadmap

  1. Phase One: Basic framework and DataX support, MySQL data source usable
    • Core interface design
    • Support for common DataX connectors (MySQL, Hive)
    • Basic command line tools
    • Batch processing capability
    • Implementation of unit tests and e2e tests
    • Summarize AI-based prompts for implementing different connectors
  2. Phase Two: Enhance support for more DataX data sources
    • Extend DataX connector support (PostgreSQL, ES, Kafka, etc.)
    • Version adaptation functionality
  3. Phase Three: Expand support for other tools and continuous optimization
    • Sqoop support implementation
    • More advanced features

Summary

The X2SeaTunnel tool adopts a unified framework design, supporting configuration conversion from multiple data integration tools to SeaTunnel. Through its plugin-based architecture, it ensures both lightweight efficiency and good extensibility. This tool helps users migrate smoothly to the SeaTunnel platform by reducing migration costs, thereby improving data integration efficiency.

The tool provides both command line and SDK usage methods to meet requirements in different scenarios. The core design focuses on the accuracy and universality of configuration mapping, ensuring that the generated SeaTunnel configurations can be used directly. The overall architecture supports the future expansion of conversion capabilities for more data integration tools.


X2SeaTunnel 设计文档

背景概述

X2SeaTunnel 是一个通用配置转换工具,用于将多种数据集成工具(如 DataX、Sqoop 等)的配置文件转换为 SeaTunnel 的 HOCON 或 JSON 配置文件,帮助用户平滑迁移到 SeaTunnel 平台。

设计思路

核心理念

  • 简单轻量:保持工具轻量高效,专注于配置文件格式转换
  • 统一框架:构建一个通用框架,支持多种数据集成工具的配置转换
  • 可扩展性:采用插件式设计,便于后续扩展支持更多工具
  • 易用性:提供多种使用方式,提供SDK,提供命令行方式,支持单脚本和批量,满足不同场景需求

Image

转换流程

源工具配置(DataX json) → 解析 → 统一模型 → 映射转换 → 生成 SeaTunnel 配置

使用方式

简单命令行方式

# 基本用法
sh bin/x2seatunnel.sh -t datax -i /path/to/config.json -o /path/to/output.conf

# 指定工具类型、输入输出和格式
sh bin/x2seatunnel.sh -t datax -i input.json -o output.conf -f hocon

# 批量转换
sh bin/x2seatunnel.sh -t datax -d /input/dir/ -o /output/dir/

Yaml命令行方式

# 使用YAML配置文件
sh bin/x2seatunnel.sh --config conversion.yaml

YAML配置文件示例

# X2SeaTunnel配置文件
metadata:
  # 配置文件格式版本
  configVersion: "1.0"
  # 描述(可选)
  description: "DataX到SeaTunnel转换配置"

# 工具配置
tool:
  # 源工具类型:datax, sqoop等
  sourceType: "datax"
  sourceVersion: "2.1.2"
  # 目标SeaTunnel版本
  targetVersion: "2.3.11"

# 输入配置
input:
  # 源配置路径(文件或目录)
  path: "/path/to/configs"
  # 是否递归处理子目录
  recursive: true
  # 文件匹配模式
  pattern: "*.json"

# 输出配置
output:
  # 输出路径
  path: "/path/to/output"
  # 输出格式:hocon或json
  format: "hocon" 
  # 文件名转换规则
  namePattern: "${filename}_seatunnel.conf"

# 映射配置
mapping:
  # 自定义映射规则路径(可选)
  rulesPath: "/path/to/custom/rules.json"
  
# 验证配置
validation:
  # 是否启用验证
  enabled: true
  # 验证失败行为:warn, error, ignore
  
# 日志配置
logging:
  # 日志级别:debug, info, warn, error
  level: "info"
  # 日志输出路径
  path: "./logs"
  # 日志文件名模式
  filePattern: "x2seatunnel-%d{yyyy-MM-dd}.log"
  # 是否同时输出到控制台
  console: true

SDK方式集成

// 创建特定工具转换器
X2SeaTunnelConverter converter = X2SeaTunnelFactory.createConverter("datax");

// 配置转换选项
ConversionOptions options = new ConversionOptions.Builder()
    .outputFormat("hocon")
    .targetVersion("2.3.11")
    .build();
    
// 执行转换
String seatunnelConfig = converter.convert(sourceConfigContent, options);

实施路线图

  1. 第一阶段:基础框架及DataX支持,Mysql数据源可使用
    • 核心接口设计
    • DataX常用连接器支持(MySQL, Hive)
    • 基本命令行工具
    • 批量处理功能
    • 实现单元测试与e2e测试
    • 总结基于AI实现不同连接器的prompt。
  2. 第二阶段:完善DataX更多数据源支持
    • 扩展DataX连接器支持(PostgreSQL,ES, Kafka等)
    • 版本适配功能
  3. 第三阶段:扩展其他工具支持与持续优化
    • Sqoop支持实现
    • 更多高级功能

总结

X2SeaTunnel工具采用统一框架设计,支持多种数据集成工具配置向SeaTunnel的转换。通过插件式架构,既保证了工具的轻量高效,又提供了良好的扩展性。该工具通过降低迁移成本,帮助用户平滑迁移到SeaTunnel平台,提高数据集成效率。

工具同时提供命令行和SDK两种使用方式,满足不同场景需求。核心设计着重于配置映射的准确性和通用性,确保生成的SeaTunnel配置可直接使用。整体架构支持未来扩展更多数据集成工具的转换能力。

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions