Skip to content

Commit b151a91

Browse files
added train and evaluation framework, updated documentation, squished some bugs
1 parent e1477df commit b151a91

File tree

33 files changed

+9104
-119
lines changed

33 files changed

+9104
-119
lines changed

backend/modules/llm.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -207,8 +207,8 @@ def create_vector_store(
207207
if config["testing_flag"]:
208208
# subset the data for testing
209209
if config["test_subset_2000"] == True:
210-
print("[INFO] Subsetting the data to 2000 rows.")
211-
documents = documents[:2000]
210+
print("[INFO] Subsetting the data to 100 rows.")
211+
documents = documents[:100]
212212
unique_docs, unique_ids = generate_unique_documents(documents, db)
213213

214214
print(

backend/modules/metadata_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,8 @@ def get_all_metadata_from_openml(config: dict) -> Tuple[pd.DataFrame, Sequence[i
136136

137137
# subset the data for testing
138138
if config["test_subset_2000"] == True:
139-
print("[INFO] Subsetting the data to 2000 rows.")
140-
all_objects = all_objects[:2000]
139+
print("[INFO] Subsetting the data to 100 rows.")
140+
all_objects = all_objects[:100]
141141

142142
data_id = [int(all_objects.iloc[i]["did"]) for i in range(len(all_objects))]
143143

docs/developer tutorials/change model.ipynb

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Tutorial on changing models\n",
8+
"- How would you use a different embedding and llm model?"
9+
]
10+
},
311
{
412
"cell_type": "code",
513
"execution_count": 1,
@@ -25,6 +33,13 @@
2533
"from modules.llm import setup_vector_db_and_qa"
2634
]
2735
},
36+
{
37+
"cell_type": "markdown",
38+
"metadata": {},
39+
"source": [
40+
"## Initial config"
41+
]
42+
},
2843
{
2944
"cell_type": "code",
3045
"execution_count": 4,
@@ -108,7 +123,7 @@
108123
"metadata": {},
109124
"source": [
110125
"# IMPORTANT\n",
111-
"- Do NOT forget to add the models to ollama/get_ollama.sh"
126+
"- Do NOT forget to change the model to the best model in ollama/get_ollama.sh"
112127
]
113128
}
114129
],

docs/developer tutorials/create vectordb.ipynb

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Tutorial on creating a vector database with openml objects\n",
8+
"- How would you use the API to create a vector database with openml objects (datasets, flows etc)"
9+
]
10+
},
311
{
412
"cell_type": "code",
513
"execution_count": 1,

docs/developer tutorials/get an llm summary.ipynb

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Getting an LLM summary using the API\n",
8+
"- How would you use the API and an LLM model + prompt to generate a summary of the results obtained from the RAG pipeline?"
9+
]
10+
},
311
{
412
"cell_type": "code",
513
"execution_count": 1,
@@ -53,7 +61,8 @@
5361
"metadata": {},
5462
"source": [
5563
"# Get LLM summary of a string\n",
56-
"- Ensure that Ollama is running before this works ```bash ollama/.get_ollama.sh``` (or use the desktop Ollama app for testing)"
64+
"- Ensure that Ollama is running before this works ```bash ollama/.get_ollama.sh``` (or use the desktop Ollama app for testing)\n",
65+
"- As you can tell, the data needs to be a string. To then get the results from a bunch of langchain documents, you must first concatenate the text you care about into a single string."
5766
]
5867
},
5968
{

docs/developer tutorials/index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Developer Tutorials
2+
3+
- Hello there, future OpenML contributor! It is nice meeting you here. This page is a collection of tutorials that will help you get started with contributing to the OpenML RAG pipeline.
4+
- The tutorials show you how to perform common tasks and should make it a lot easier to get started with contributing to this project.
5+
- Note that you would have had to setup the project before you begin. If you missed this step, please refer to [index](../index.md)

docs/developer tutorials/load vectordb and get results.ipynb

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Load the Chroma Db and get retrieval results for a given query\n",
8+
"- How would you load the Chroma Db and get retrieval results for a given query?"
9+
]
10+
},
311
{
412
"cell_type": "code",
513
"execution_count": 6,

docs/developer tutorials/run multiple queries and aggregate.ipynb

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,17 @@
523523
"execution_count": 9,
524524
"metadata": {},
525525
"output_type": "execute_result"
526+
},
527+
{
528+
"ename": "",
529+
"evalue": "",
530+
"output_type": "error",
531+
"traceback": [
532+
"\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
533+
"\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
534+
"\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n",
535+
"\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
536+
]
526537
}
527538
],
528539
"source": [

0 commit comments

Comments
 (0)