Supercharge Your Streamlit Visualizations: Batch Processing Iceberg Data with Apache Flink

After you have all your fun running the Java-based Flink application DataGeneratorApp to kickstart the data pipeline. This app powers up your Kafka topics—airline.skyone and airline.sunset—by generating sample records that fuel the rest of the process. Once the data flows into Kafka, launch the FlightImporterApp Flink application. This crucial step reads the enriched data from Kafka and writes it into the apache_kickstarter.airlines.flight Apache Iceberg table, seamlessly preparing your data for advanced analytics and insight generation.

Get ready to see the magic of Flink in action! Now, it's time to share those insights with the world! One fantastic way to do that is with Streamlit, which allows you to easily create interactive visualizations. Streamlit is intuitive, powerful, and designed with Python developers in mind, making it a breeze to turn raw data into captivating dashboards. 😉

Table of Contents

1.0 Power up the Apache Flink Docker containers
2.0 Supercharge Your Streamlit Visualizations
- 2.1 Did you notice we prepended uv run to streamlit run?
3.0 Local Integration: How This App Harnesses Apache Flink
4.0 Resources

1.0 Power up the Apache Flink Docker containers

Prerequisite

Before you can run scripts/run-flink-locally.sh Bash script, you need to install the aws2-wrap utility. If you have a Mac machine, run this command from your Terminal:
brew install aws2-wrap
If you are not using a Mac, make sure you have Python3.x installed on your machine, and run this command from your Terminal:
pip install aws2-wrap

This section guides you through the local setup (on one machine but in separate containers) of the Apache Flink cluster in Session mode using Docker containers with support for Apache Iceberg. Run the bash script below to start the Apache Flink cluster in Session Mode on your machine:

scripts/run-flink-locally.sh <DOCKER_SWITCH> --profile=<AWS_SSO_PROFILE_NAME>
                                             --chip=<amd64 | arm64>
                                             --aws-s3-bucket=<AWS_S3_BUCKET_NAME>

Argument placeholder Replace with

<DOCKER_SWITCH> on to start up your very own local Apache Cluster running in Docker containers, otherwise off to stop the Docker containers.

<AWS_SSO_PROFILE_NAME> your AWS SSO profile name for your AWS infrastructue that host your AWS Secrets Manager.

<CHIP> if you are running on a Mac with M1, M2, or M3 chip, use arm64. Otherwise, use amd64.

<AWS_S3_BUCKET_NAME> specify the name of the AWS S3 bucket used to store Apache Iceberg files.

To learn more about this script, click here.

2.0 Supercharge Your Streamlit Visualizations

To access the Flink JobManager (supercharge_streamlit-apache_flink-jobmanager-1) container, open the interactive shell by running:

docker exec -it -w /opt/flink/python_apps supercharge_streamlit-apache_flink-jobmanager-1 /bin/bash

Jump right into the container and take charge! You’ll have full control to run commands, explore the file system, and tackle any tasks you need. You’ll land directly in the /opt/flink/python_apps directory—this is the headquarters for the Python script in the repo.

To illustrate, I created a Streamlit script that queries the apache_kickstarter.airlines.flight Apache Iceberg Table, harnessing Flink SQL to extract valuable insights. These insights are then brought to life through a Streamlit dashboard, transforming raw data into an accessible, visual experience.

Here you go, run this in the docker container terminal command line:

uv run streamlit run app.py -- --aws-s3-bucket <AWS_S3_BUCKET> --aws-region <AWS_REGION_NAME>

Notice the extra -- between streamlit run app.py and the actual script arguments. This is necessary to pass arguments to the Streamlit script without causing conflicts with Streamlit's own CLI options.

When you run the script, for instance, it produces the following output:

Open your host web browser, enter the local URL, localhost:8501, and in a few moments this web page will be displayed:

"After many years in this industry, I’m still amazed by what we can achieve today! The possibilities are endless—enjoy the ride!"

---J3

2.1 Did you notice we prepended `uv run` to `streamlit run`?

You maybe asking yourself why. Well, uv is an incredibly fast Python package installer and dependency resolver, written in Rust, and designed to seamlessly replace pip, pipx, poetry, pyenv, twine, virtualenv, and more in your workflows. By prefixing uv run to a command, you're ensuring that the command runs in an optimal Python environment.

Now, let's go a little deeper into the magic behind uv run:

When you use it with a file ending in .py or an HTTP(S) URL, uv treats it as a script and runs it with a Python interpreter. In other words, uv run file.py is equivalent to uv run python file.py. If you're working with a URL, uv even downloads it temporarily to execute it. Any inline dependency metadata is installed into an isolated, temporary environment—meaning zero leftover mess! When used with -, the input will be read from stdin, and treated as a Python script.
If used in a project directory, uv will automatically create or update the project environment before running the command.
Outside of a project, if there's a virtual environment present in your current directory (or any parent directory), uv runs the command in that environment. If no environment is found, it uses the interpreter's environment.

So what does this mean when we put uv run before streamlit run? It means uv takes care of all the setup—fast and seamless—right in your local Docker container. If you think AI/ML is magic, the work the folks at Astral have done with uv is pure wizardry!

Curious to learn more about Astral's uv? Check these out:

Documentation: Learn about uv.
Video: uv IS THE FUTURE OF PYTHON PACKING!.

3.0 Local Integration: How This App Harnesses Apache Flink

This app.py Python script streamlines Apache Flink integration by leveraging PyFlink directly in the app.py script. Running the Flink job using the same Python process as the Streamlit app offers an intuitive setup for local testing and debugging. It's an ideal solution for quickly iterating on data processing logic without additional infrastructure.

However, a more advanced setup is recommended for production environments where scalability is critical. Running Flink jobs in a separate process and enabling communication through Kafka or a REST API provides greater scalability, albeit at the cost of added complexity.

The simplicity of this app’s current approach makes it perfect for prototyping and development while offering a foundation to explore more sophisticated integration methods as your requirements grow.

4.0 Resources

Flink Python Docs

PyFlink API Reference

Apache Iceberg in Action with Apache Flink using Python

Streamlit Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.blog		.blog
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
KNOWNISSUES.md		KNOWNISSUES.md
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
dockerignore		dockerignore
linux-Dockerfile		linux-Dockerfile
linux-docker-compose.yml		linux-docker-compose.yml
mac-Dockerfile		mac-Dockerfile
mac-docker-compose.yml		mac-docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supercharge Your Streamlit Visualizations: Batch Processing Iceberg Data with Apache Flink

1.0 Power up the Apache Flink Docker containers

2.0 Supercharge Your Streamlit Visualizations

2.1 Did you notice we prepended `uv run` to `streamlit run`?

3.0 Local Integration: How This App Harnesses Apache Flink

4.0 Resources

About

Releases 5

Languages

Argument placeholder	Replace with
`<DOCKER_SWITCH>`	`on` to start up your very own local Apache Cluster running in Docker containers, otherwise `off` to stop the Docker containers.
`<AWS_SSO_PROFILE_NAME>`	your AWS SSO profile name for your AWS infrastructue that host your AWS Secrets Manager.
`<CHIP>`	if you are running on a Mac with M1, M2, or M3 chip, use `arm64`. Otherwise, use `amd64`.
`<AWS_S3_BUCKET_NAME>`	specify the name of the AWS S3 bucket used to store Apache Iceberg files.

License

j3-signalroom/supercharge_streamlit-apache_flink

Folders and files

Latest commit

History

Repository files navigation

Supercharge Your Streamlit Visualizations: Batch Processing Iceberg Data with Apache Flink

1.0 Power up the Apache Flink Docker containers

2.0 Supercharge Your Streamlit Visualizations

2.1 Did you notice we prepended uv run to streamlit run?

3.0 Local Integration: How This App Harnesses Apache Flink

4.0 Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Languages

2.1 Did you notice we prepended `uv run` to `streamlit run`?