Skip to content

astronomer/airflow-llm-demo

Repository files navigation

Sissy-G ~= Syzygy /ˈsɪzɪdʒi/ ASTRONOMY the alignment of three or more celestial bodies.

Overview

This demonstration shows an Airflow integration with Weaviate and OpenAI. Sissy-G Toys is an online retailer for toys and games. The GroundTruth customer analytics application provides marketing, sales and product managers with a one-stop-shop for analytics.

This workflow includes:

All of the above are presented in a Streamlit applicaiton.

Project Contents

Your Astro project contains the following files and folders:

  • dags: This folder contains the Python files for the Airflow DAG.

  • Dockerfile: This file contains a versioned Astro Runtime Docker image that provides a differentiated Airflow experience. If you want to execute other commands or overrides at runtime, specify them here.

  • include: This folder contains additional directories for the services that will be built in the demo. Services included in this demo include:

    • minio: Object storage which is used for ingest staging as well as stateful backups for other services.
    • mlflow: A platform for the machine learning lifecycle including model registry and experiment tracking.
    • weaviate: A vector database
    • streamlit: A web application framework for building data-centric apps.
  • packages.txt: Install OS-level packages needed for the project.

  • requirements.txt: Install Python packages needed for the project.

  • plugins: Add custom or community plugins for your project to this file. It is empty by default.

  • airflow_settings.yaml: Use this local-only file to specify Airflow Connections, Variables, and Pools instead of entering them in the Airflow UI as you develop DAGs in this project.

Deploy Your Project Locally

Prerequisites: Docker Desktop or similar Docker services running locally.
OpenAI account or Trial Account

  1. Install Astronomver CLI. The Astro CLI is a command-line interface for data orchestration. It allows you to get started with Apache Airflow quickly and it can be used with all Astronomer products. This will provide a local instance of Airflow if you don't have an existing service. For MacOS
brew install astro

For Linux

curl -sSL install.astronomer.io | sudo bash -s
  1. Clone this repository.
git clone https://github.com/astronomer/airflow-llm-demo
cd airflow-llm-demo
  1. The data for this demo has been pre-embedded so the DAG will run without requiring an OpenAI token. However, the Streamlit app uses the Weaviate Q&A and near text modules and an OpenAI key is required to generate embeddings for the users question or search term.

If you would like to run the Streamlit application you will need to add you OpenAI API key to the AIRFLOW_CONN_WEAVIATE_DEFAULT variable in the .env file.

  1. Start Airflow, Minio, Weaviate, Streamlit and MLflow.
astro dev start
  1. Run the Airflow DAG in the Airflow UI
astro dev run dags unpause customer_analytics
astro dev run dags trigger customer_analytics

Follow the status of the DAG run in the Airflow UI (username: admin, password: admin)

  1. After the DAG completes look at the customer analytics dashboard in Streamlit.
    Streamlit has been installed alongside the Airflow UI in the webserver container.

Connect to the webserver container with the Astro CLI

astro dev bash -w

Start Streamlit

cd include/streamlit/src
python -m streamlit run ./streamlit_app.py

Open the streamlit application in a browser.

Other service UIs are available at the the following:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published