Skip to content

Commit

Permalink
Feature/wren ai service/demo lab (#128)
Browse files Browse the repository at this point in the history
* setup duckdb demo dataset testing

* allow demo to use duckdb demo dataset

* remove debug message

* remove unused command

* update wren-ai-service demo setup and README

* reformat sql, prevent duplicating deployment, update sampledata mdl

* update logger setup and fix preview sql issue

* remove psycopg2

* refine demo setup flow

* revert

* remove unused file

* remove unused dep

* change url

* revert

* add TODO for improvement

* update
  • Loading branch information
cyyeh authored Apr 18, 2024
1 parent 9ff21b6 commit cf02536
Show file tree
Hide file tree
Showing 18 changed files with 2,355 additions and 150 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ wren-ai-service/poetry.lock
wren-ai-service/assertion.log
wren-ai-service/demo/spider
wren-ai-service/demo/poetry.lock
wren-ai-service/demo/custom_dataset

# cache
__pycache__
Expand Down
1 change: 1 addition & 0 deletions docker/docker-compose-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ services:
- wren
depends_on:
- qdrant
- wren-engine

qdrant:
image: qdrant/qdrant:v1.7.4
Expand Down
2 changes: 1 addition & 1 deletion wren-ai-service/.env.dev.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# fastapi related
WREN_AI_SERVICE_HOST=127.0.0.1
WREN_AI_SERVICE_PORT=5555
WREN_AI_SERVICE_PORT=5556

# app related
QDRANT_HOST=localhost
Expand Down
19 changes: 13 additions & 6 deletions wren-ai-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,18 @@

## Demo

- you should stop all services first before running the demo
- prerequisites
- install and run the docker service, and you should stop all WrenAI services first before running the demo
- go to the `../docker` folder and prepare the `.env.local` file
- make sure the node version is v16.19.0
- if you are using Python 3.12+, please also install `setuptools` in order to successfully install the dependencies of the wren-ui service
- go to the `demo` folder and run `poetry install` to install the dependencies
- start the docker service
- in the `demo` folder, run `make prepare` in one terminal, and `make run` in another terminal to start the demo and go to `http://localhost:8501` to see the demo
- `make prepare` will run three other services: qdrant, wren-engine, and wren-ai-service
- in the `demo` folder, open three terminals
- in the first terminal, run `make prepare` to start the docker containers and `make run` to start the demo service
- in the second terminal, run `make ui` to start the wren-ui service
- in the third terminal, run `make ai` to start the wren-ai service
- ports of the services:
- wren-engine: ports should be 8080
- wren-ai-service: port should be 5556
- wren-ui: port should be 3000
- qdrant: ports should be 6333, 6334
- wren-engine: ports should be 8080, 7342
- wren-ai-service: port should be 5555
13 changes: 12 additions & 1 deletion wren-ai-service/demo/Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
prepare:
cd .. && make run-all && make start
cd ../../docker; docker compose --env-file .env.local -f docker-compose-dev.yaml up -d
cd ..; make run-qdrant

stop:
cd ../../docker; docker compose --env-file .env.local -f docker-compose-dev.yaml down
cd ..; make stop-qdrant

ui:
cd ../../wren-ui; export DB_TYPE=sqlite export SQLITE_FILE=db.sqlite3 yarn && yarn rollback --all && yarn migrate && yarn dev

ai:
cd ..; make start

run:
poetry run streamlit run app.py
191 changes: 129 additions & 62 deletions wren-ai-service/demo/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@
from utils import (
ask,
ask_details,
get_current_manifest,
get_datasets,
get_mdl_json,
get_new_mdl_json,
is_current_manifest_available,
prepare_duckdb,
prepare_semantics,
rerun_wren_engine,
save_mdl_json_file,
Expand All @@ -22,7 +25,9 @@
if "deployment_id" not in st.session_state:
st.session_state["deployment_id"] = str(uuid.uuid4())
if "chosen_dataset" not in st.session_state:
st.session_state["chosen_dataset"] = None
st.session_state["chosen_dataset"] = "music"
if "dataset_type" not in st.session_state:
st.session_state["dataset_type"] = "duckdb"
if "chosen_models" not in st.session_state:
st.session_state["chosen_models"] = None
if "mdl_json" not in st.session_state:
Expand All @@ -45,78 +50,140 @@
st.session_state["query_history"] = None


def onchane_demo_dataset():
st.session_state["chosen_dataset"] = st.session_state["choose_demo_dataset"]


def onchange_spider_dataset():
st.session_state["chosen_dataset"] = st.session_state["choose_spider_dataset"]


if __name__ == "__main__":
datasets = get_datasets()

col1, col2 = st.columns([2, 4])

with col1:
uploaded_file = st.file_uploader(
"Upload an MDL json file, and the file name must be [xxx]_mdl.json",
type="json",
)
st.markdown("or")
chosen_dataset = st.selectbox(
"Select a database from the Spider dataset",
options=datasets,
index=datasets.index("college_3"), # default dataset
)
if uploaded_file is not None:
if "_mdl.json" not in uploaded_file.name:
st.error("File name must be [xxx]_mdl.json")
st.stop()
st.session_state["chosen_dataset"] = uploaded_file.name.split("_mdl.json")[
0
]
st.session_state["mdl_json"] = json.loads(
uploaded_file.getvalue().decode("utf-8")
with st.expander("Current Deployed Model"):
manifest_name, models, relationships = get_current_manifest()
st.markdown(f"Current Deployed Model: {manifest_name}")
show_er_diagram(models, relationships)
with st.expander("Deploy New Model"):
uploaded_file = st.file_uploader(
# "Upload an MDL json file, and the file name must be [xxx]_bigquery_mdl.json or [xxx]_duckdb_mdl.json",
"Upload an MDL json file, and the file name must be [xxx]_duckdb_mdl.json",
type="json",
)
save_mdl_json_file(uploaded_file.name, st.session_state["mdl_json"])
elif chosen_dataset and st.session_state["chosen_dataset"] != chosen_dataset:
st.session_state["chosen_dataset"] = chosen_dataset
st.session_state["mdl_json"] = get_mdl_json(chosen_dataset)

st.markdown("---")

chosen_models = st.multiselect(
"Select data models for AI to generate MDL metadata",
[model["name"] for model in st.session_state["mdl_json"]["models"]],
)
if chosen_models and st.session_state["chosen_models"] != chosen_models:
st.session_state["chosen_models"] = chosen_models
st.session_state["mdl_json"] = get_mdl_json(chosen_dataset)

ai_generate_metadata_ok = st.button(
"AI Generate MDL Metadata",
disabled=not chosen_models,
)
if ai_generate_metadata_ok:
st.session_state["mdl_json"] = get_new_mdl_json(chosen_models=chosen_models)

# Display the model using the selected database
st.markdown("MDL Model")
st.json(
body=st.session_state["mdl_json"],
expanded=False,
)

show_er_diagram()

deploy_ok = st.button(
"Deploy the MDL model using the selected database",
type="primary",
)
# Semantics preparation
if deploy_ok:
rerun_wren_engine(
st.session_state["chosen_dataset"],
st.session_state["mdl_json"],
st.markdown("or")
chosen_demo_dataset = st.selectbox(
"Select a demo dataset",
key="choose_demo_dataset",
options=["music", "nba", "ecommerce"],
index=0,
on_change=onchane_demo_dataset,
)
# st.markdown("or")
# chosen_spider_dataset = st.selectbox(
# "Select a database from the Spider dataset",
# key='choose_spider_dataset',
# options=datasets,
# index=datasets.index("college_3"), # default dataset
# on_change=onchange_spider_dataset,
# )

if uploaded_file is not None:
# if "_bigquery_mdl.json" not in uploaded_file.name and "_duckdb_mdl.json" not in uploaded_file.name:
# st.error("File name must be [xxx]_bigquery_mdl.json or [xxx]_duckdb_mdl.json")
# st.stop()

if "_duckdb_mdl.json" not in uploaded_file.name:
st.error("File name must be [xxx]_duckdb_mdl.json")
st.stop()

if "_duckdb_mdl.json" in uploaded_file.name:
st.session_state["chosen_dataset"] = uploaded_file.name.split(
"_duckdb_mdl.json"
)[0]
st.session_state["dataset_type"] = "duckdb"
st.session_state["mdl_json"] = json.loads(
uploaded_file.getvalue().decode("utf-8")
)
save_mdl_json_file(uploaded_file.name, st.session_state["mdl_json"])
# elif "_bigquery_mdl.json" in uploaded_file.name:
# st.session_state["chosen_dataset"] = uploaded_file.name.split("_bigquery_mdl.json")[
# 0
# ]
# st.session_state["dataset_type"] = "bigquery"
# st.session_state["mdl_json"] = json.loads(
# uploaded_file.getvalue().decode("utf-8")
# )
# save_mdl_json_file(uploaded_file.name, st.session_state["mdl_json"])
# elif chosen_spider_dataset and st.session_state["chosen_dataset"] == chosen_spider_dataset:
# st.session_state["chosen_dataset"] = chosen_spider_dataset
# st.session_state["dataset_type"] = "bigquery"
# st.session_state["mdl_json"] = get_mdl_json(chosen_spider_dataset, type='spider')
elif (
chosen_demo_dataset
and st.session_state["chosen_dataset"] == chosen_demo_dataset
):
st.session_state["chosen_dataset"] = chosen_demo_dataset
st.session_state["dataset_type"] = "duckdb"
st.session_state["mdl_json"] = get_mdl_json(
chosen_demo_dataset, type="demo"
)

st.markdown("---")

chosen_models = st.multiselect(
"Select data models for AI to generate MDL metadata",
[model["name"] for model in st.session_state["mdl_json"]["models"]],
)
prepare_semantics(st.session_state["mdl_json"])
if chosen_models and st.session_state["chosen_models"] != chosen_models:
st.session_state["chosen_models"] = chosen_models
type = (
"demo" if st.session_state["dataset_type"] == "duckdb" else "spider"
)
st.session_state["mdl_json"] = get_mdl_json(
st.session_state["chosen_dataset"], type=type
)

ai_generate_metadata_ok = st.button(
"AI Generate MDL Metadata",
disabled=not chosen_models,
)
if ai_generate_metadata_ok:
st.session_state["mdl_json"] = get_new_mdl_json(
chosen_models=chosen_models
)

# Display the model using the selected database
st.markdown("MDL Model")
st.json(
body=st.session_state["mdl_json"],
expanded=False,
)

show_er_diagram(
st.session_state["mdl_json"]["models"],
st.session_state["mdl_json"]["relationships"],
)

deploy_ok = st.button(
"Deploy the MDL model using the selected database",
type="primary",
)
# Semantics preparation
if deploy_ok:
if st.session_state["dataset_type"] == "duckdb":
prepare_duckdb(st.session_state["chosen_dataset"])

rerun_wren_engine(st.session_state["mdl_json"])
prepare_semantics(st.session_state["mdl_json"])

query = st.chat_input(
"Ask a question about the database",
disabled=st.session_state["semantics_preparation_status"] != "finished",
disabled=(not is_current_manifest_available())
and st.session_state["semantics_preparation_status"] != "finished",
)

with col2:
Expand Down
1 change: 0 additions & 1 deletion wren-ai-service/demo/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ readme = "README.md"
[tool.poetry.dependencies]
python = "^3.12"
gdown = "^5.1.0"
psycopg2-binary = "^2.9.9"
requests = "^2.31.0"
sqlparse = "^0.4.4"
streamlit = "^1.32.2"
Expand Down
Loading

0 comments on commit cf02536

Please sign in to comment.