A complete Docker-based setup demonstrating how to use Apache Flink to write data to MinIO (S3-compatible storage) using Apache Paimon as a lakehouse storage format. This project solves the dependency hell that often occurs when trying to integrate these components together.
- Apache Flink 1.19.3 - Stream processing framework
- Apache Paimon 1.2.0 - Lakehouse storage format with ACID transactions
- MinIO (latest) - S3-compatible object storage
- Custom Docker Image - Pre-built with all required JARs to avoid dependency conflicts
Previous attempts to use volume mounts for JARs often failed due to classpath and dependency resolution issues. This project builds a custom Dockerfile that extends flink:1.19.3-java17
and downloads three critical JARs directly into /opt/flink/lib/
:
paimon-flink-1.19-1.2.0.jar
- Main Paimon connector for Flinkpaimon-s3-1.2.0.jar
- Paimon's S3 implementationflink-shaded-hadoop-2-uber-2.8.3-10.0.jar
- Required Hadoop classes
The key insight: Paimon internally requires Hadoop classes regardless of which S3 approach you use, but Flink's base images don't include them. The custom Docker image ensures all dependencies are available in a single, consistent environment - eliminating the dependency hell that plagued earlier versions of this integration.
# Build the custom Flink image with embedded JARs
docker compose build --no-cache
# Start all services (MinIO, Flink JobManager, Flink TaskManager)
docker compose up -d
- Flink Web UI: http://localhost:8081
- MinIO Console: http://localhost:9001 (admin/password123)
- MinIO API: http://localhost:9000
docker exec -it flink-jobmanager /opt/flink/bin/sql-client.sh embedded
Once in the Flink SQL client, run these commands:
-- Create the Paimon catalog pointing to MinIO
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'warehouse' = 's3://warehouse/paimon/',
's3.endpoint' = 'http://minio:9000',
's3.access-key' = 'admin',
's3.secret-key' = 'password123',
's3.path.style.access' = 'true'
);
-- Switch to the Paimon catalog
USE CATALOG paimon_catalog;
-- Create a database
CREATE DATABASE test_db;
USE test_db;
-- Create a table with primary key
CREATE TABLE user_events (
user_id BIGINT,
event_type STRING,
timestamp_val TIMESTAMP(3),
PRIMARY KEY (user_id) NOT ENFORCED
);
-- Insert some test data
INSERT INTO user_events VALUES
(1001, 'login', TIMESTAMP '2024-01-01 10:00:00'),
(1002, 'purchase', TIMESTAMP '2024-01-01 10:15:00');
-- Query the data
SELECT * FROM user_events;
- Your INSERT job will appear in the Flink Web UI and complete successfully
- Data files will be created in MinIO under the
/warehouse/paimon/
path - Paimon maintains full ACID properties with snapshots, manifests, and schema evolution support
To use this setup with actual AWS S3 instead of MinIO, simply modify the catalog configuration:
CREATE CATALOG paimon_catalog WITH (
'type' = 'paimon',
'warehouse' = 's3://your-bucket/paimon/',
's3.access-key' = 'your-access-key',
's3.secret-key' = 'your-secret-key',
's3.path.style.access' = 'false' -- Use virtual-hosted style for AWS
);
The same JARs work perfectly with real AWS S3!
# Stop all services
docker compose down
# Remove volumes (if you want to start fresh)
docker compose down -v
- This is an updated version of a project I originally started two years ago
- MinIO credentials are
admin
/password123
for local development - The setup automatically creates the required buckets (
warehouse
andcheckpoints
) - All data is persisted in Docker volumes between restarts
If everything works correctly, you should see:
- Flink jobs complete successfully in the Web UI
- Data files appear in MinIO storage with proper Paimon structure
- No JAR conflicts or classpath issues in the logs
This integration finally makes Flink + Paimon + S3 storage work reliably together! 🚀