Skip to content

gordonmurray/apache_flink_and_paimon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Flink + Apache Paimon + MinIO Integration

A complete Docker-based setup demonstrating how to use Apache Flink to write data to MinIO (S3-compatible storage) using Apache Paimon as a lakehouse storage format. This project solves the dependency hell that often occurs when trying to integrate these components together.

🚀 What's Inside

  • Apache Flink 1.19.3 - Stream processing framework
  • Apache Paimon 1.2.0 - Lakehouse storage format with ACID transactions
  • MinIO (latest) - S3-compatible object storage
  • Custom Docker Image - Pre-built with all required JARs to avoid dependency conflicts

🎯 Why This Approach Works

Previous attempts to use volume mounts for JARs often failed due to classpath and dependency resolution issues. This project builds a custom Dockerfile that extends flink:1.19.3-java17 and downloads three critical JARs directly into /opt/flink/lib/:

  • paimon-flink-1.19-1.2.0.jar - Main Paimon connector for Flink
  • paimon-s3-1.2.0.jar - Paimon's S3 implementation
  • flink-shaded-hadoop-2-uber-2.8.3-10.0.jar - Required Hadoop classes

The key insight: Paimon internally requires Hadoop classes regardless of which S3 approach you use, but Flink's base images don't include them. The custom Docker image ensures all dependencies are available in a single, consistent environment - eliminating the dependency hell that plagued earlier versions of this integration.

🛠️ Quick Start

1. Build and Start the Services

# Build the custom Flink image with embedded JARs
docker compose build --no-cache

# Start all services (MinIO, Flink JobManager, Flink TaskManager)
docker compose up -d

2. Verify Everything is Running

3. Connect to Flink SQL Client

docker exec -it flink-jobmanager /opt/flink/bin/sql-client.sh embedded

4. Create Your First Paimon Table

Once in the Flink SQL client, run these commands:

-- Create the Paimon catalog pointing to MinIO
CREATE CATALOG paimon_catalog WITH (
   'type' = 'paimon',
   'warehouse' = 's3://warehouse/paimon/',
   's3.endpoint' = 'http://minio:9000',
   's3.access-key' = 'admin',
   's3.secret-key' = 'password123',
   's3.path.style.access' = 'true'
);

-- Switch to the Paimon catalog
USE CATALOG paimon_catalog;

-- Create a database
CREATE DATABASE test_db;
USE test_db;

-- Create a table with primary key
CREATE TABLE user_events (
  user_id BIGINT,
  event_type STRING,
  timestamp_val TIMESTAMP(3),
  PRIMARY KEY (user_id) NOT ENFORCED
);

-- Insert some test data
INSERT INTO user_events VALUES
  (1001, 'login', TIMESTAMP '2024-01-01 10:00:00'),
  (1002, 'purchase', TIMESTAMP '2024-01-01 10:15:00');

-- Query the data
SELECT * FROM user_events;

📊 What You'll See

  • Your INSERT job will appear in the Flink Web UI and complete successfully
  • Data files will be created in MinIO under the /warehouse/paimon/ path
  • Paimon maintains full ACID properties with snapshots, manifests, and schema evolution support

🌩️ Using with Real AWS S3

To use this setup with actual AWS S3 instead of MinIO, simply modify the catalog configuration:

CREATE CATALOG paimon_catalog WITH (
   'type' = 'paimon',
   'warehouse' = 's3://your-bucket/paimon/',
   's3.access-key' = 'your-access-key',
   's3.secret-key' = 'your-secret-key',
   's3.path.style.access' = 'false'  -- Use virtual-hosted style for AWS
);

The same JARs work perfectly with real AWS S3!

🧹 Cleanup

# Stop all services
docker compose down

# Remove volumes (if you want to start fresh)
docker compose down -v

📝 Notes

  • This is an updated version of a project I originally started two years ago
  • MinIO credentials are admin / password123 for local development
  • The setup automatically creates the required buckets (warehouse and checkpoints)
  • All data is persisted in Docker volumes between restarts

🎉 Success

If everything works correctly, you should see:

  • Flink jobs complete successfully in the Web UI
  • Data files appear in MinIO storage with proper Paimon structure
  • No JAR conflicts or classpath issues in the logs

This integration finally makes Flink + Paimon + S3 storage work reliably together! 🚀

About

Trying out Apache Paimon with Apache Flink using Docker Compose

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project