Welcome to the "Trino Kafka Demo" project! This guide will walk you through setting up and using this demonstration, showcasing how to query Kafka topics using SQL with Trino. We'll cover everything from installation to viewing your Kafka messages.
Before diving into the setup, let's get familiar with the key components of our project:
- Trino: The powerful SQL query engine we'll use to interrogate data directly from Kafka.
- Kafka: Our data streaming platform, handling the flow of social media posts.
- Zookeeper: Works behind the scenes to manage Kafka's cluster state and configurations.
- Redpanda Console: A web-based UI for visualizing and managing our Kafka data streams.
- PostgreSQL: Our relational database, used to demonstrate data storage and retrieval.
- Docker and Docker Compose
- Python 3.x
trino-kafka-demo/
├── docker-compose.yml # Docker Compose configuration file to set up and run Kafka, Trino, and PostgreSQL.
├── kafka_producer/
│ ├── init.py
│ └── main.py # The main Python script for the Kafka producer.
├── setup.py # Setup script for the Kafka producer package, useful for installing the package.
└── trino/ # Trino configuration files and SQL scripts.
└── etc/
├── catalog/
│ ├── kafka.properties # Configuration for the Kafka connector in Trino.
│ └── postgres.properties # Configuration for the PostgreSQL connector in Trino.
├── config.properties # General Trino server configuration.
├── jvm.config # JVM configuration for Trino.
└── kafka/
└── demo.fake_social_media.json # JSON file defining the structure of the Kafka topic for Trino.
- Clone the Repository:
Open your terminal and clone the project repository.
git clone https://github.com/sorieux/trino-kafka-demo.git
- Install Dependencies:
Ensure Python 3 is installed, then run:
pip install .
This command installs all required Python libraries, including the Kafka client and Faker for data generation.
- Launch with Docker Compose:
Bring our environment to life using Docker:
docker-compose up -d
This starts all the necessary services: Kafka, Zookeeper, Trino, Redpanda Console, and PostgreSQL.
Now, let's generate some data for a fictional social network. Our main.py script uses Faker to create these simulated social media posts.
- Single Batch: To send a fixed number of posts to Kafka, run:
kafka-producer demo.fake_social_media --count 20
- Continuous Mode: Mimic a live stream by sending posts continuously:
kafka-producer demo.fake_social_media --continuous
Note
It's crucial to use the topic name demo.fake_social_media as this is the topic for which Trino configuration is set.
This can be found in the trino/etc/kafka/demo.fake_social_media.json
and `trino/etc/catalog/kafka.properties files.
With data flowing into Kafka, let's see it in action:
- Open Redpanda Console: Head to
http://localhost:8000
on your browser. - Explore Kafka Topics: Navigate to the topic you've been publishing to and observe the incoming messages.
Time to query our data using Trino:
-
Access Trino: Use the Trino CLI or SQL client with the following configuration:
- JDBC URL:
jdbc:trino://localhost:8080
- Username:
admin
- Password: No password is required for this demo; leave the field blank.
- Driver Name:
Trino
. The Trino driver can be downloaded from Trino's Driver Maven Repository. For more details about this driver, refer to the Trino JDBC Documentation.
- JDBC URL:
For assistance in configuring DBeaver with Trino, consult the DBeaver Configuration Guide. Additionally, if you prefer using the Trino CLI, detailed information is available in the Trino CLI Documentation.
- Run a Basic Query: Start with a simple select statement:
SELECT
username,
post_content,
likes,
comments,
shares,
timestamp
FROM
kafka.demo.fake_social_media
We can explore the details of your request on Trino UI to http://localhost:8000
on your browser.
- Data Transfer to PostgreSQL: Execute a query to transfer data to PostgreSQL:
CREATE TABLE postgres.public.social_media AS
SELECT
username,
post_content,
likes,
comments,
shares,
timestamp
FROM
kafka.demo.fake_social_media
- Check the data on PostgreSQL table: Execute this query for example
SELECT * FROM postgres.public.social_media
Congratulations! You've just set up a fully functional demo showcasing the integration of Kafka, Trino, and PostgreSQL. Now, you're ready to explore further, perhaps customizing the data streams or queries.
In the journey of creating this demo, several resources have been invaluable. Whether you're looking to deepen your understanding or troubleshoot issues, the following links can be of great assistance:
-
Trino Documentation: For comprehensive details on Trino's capabilities and configurations.
-
Kafka and Zookeeper: To understand more about data streaming and management.
-
Faker Library for Python: For generating fake data for your applications.
Interested in contributing or have suggestions? Great! Feel free to contact me for any queries or submit a Pull Request with your improvements.
This project is open-source under the MIT License.