Skip to content

Commit

Permalink
Ragas evaluation - on optimization 2, ensemble retriever with bm25
Browse files Browse the repository at this point in the history
  • Loading branch information
hillaryke committed Jul 29, 2024
1 parent 3a3b88c commit 6b8b0e4
Show file tree
Hide file tree
Showing 2 changed files with 239 additions and 389 deletions.
239 changes: 239 additions & 0 deletions notebooks/optimization_techniques/ensembel_retrievers.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import modules"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd\n",
"\n",
"os.chdir(\"../../\")\n",
"\n",
"from datasets import load_dataset\n",
"from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.chains import RetrievalQA\n",
"from langchain_community.document_loaders import HuggingFaceDatasetLoader\n",
"from langchain.embeddings import HuggingFaceEmbeddings\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from src.rag_pipeline import chunk_by_recursive_split, RAGSystem\n",
"from src.env_loader import load_api_keys\n",
"from src.ragas.ragas_pipeline import run_ragas_evaluation\n",
"from src import display_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load API keys"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"openai_api_key = load_api_keys(\"OPENAI_API_KEY\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize embeddings and RAG system"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')\n",
"embeddings = OpenAIEmbeddings(api_key=openai_api_key, model='text-embedding-ada-002')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize RAG system with ensemble_retriever with BM25 retriever"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"optimization_name = \"ensemble_retriever_with_bm25\"\n",
"optimization_no = 2"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"\n",
"rag_system_ensemble = RAGSystem(\n",
" model_name = \"gpt-4o\",\n",
" existing_vectorstore = False,\n",
" use_ensemble_retriever = True,\n",
" embeddings=embeddings\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--Split 1000 documents into 5030 chunks.--\n"
]
}
],
"source": [
"rag_system_ensemble.initialize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Check the RAG system\n",
"TODO - Write a test to check if RAG system is working properly - asserts for the output"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What event is Rory McIlroy preparing for after the WGC-Cadillac Championship?',\n",
" 'answer': 'Rory McIlroy is preparing for the U.S. Masters at Augusta after the WGC-Cadillac Championship.',\n",
" 'contexts': ['(CNN)Jordan Spieth has Rory McIlroy and the world No.1 spot firmly in his sights after winning the Valspar Championship on Sunday. Spieth won a three-way play-off with a 28-foot birdie on the third extra hole to become only the fourth player since 1940 to win twice on the PGA Tour before turning 22. It is a feat that not even McIlroy mastered with Tiger Woods, Sergio Garcia and Robert Gamez the only players to have achieved that particular accolade in the past 75 years. But it is the Northern Irishman that is within Spieth\\'s focus heading towards Augusta. \"I like studying the game, being a historian of the game,\" Spieth told the PGA Tour website. \"It\\'s really cool to have my name go alongside those. \"But right now currently what I\\'m really focused on is Rory McIlroy and the No.1 in the world. That\\'s who everyone is trying to chase. \"That\\'s our ultimate goal to eventually be the best in the world and this is a great, great stepping stone. But going into the four majors of the year, to',\n",
" 'was good fun.\" Not that opportunity knocked for McIlroy when he chose the 3-iron to play his third shot to the 18th and final hole of the tournament and promptly found the water again. The Northern Irishman feigned to repeat his earlier antics, before placing it back in the bag. His mistake led to a double bogey six and left him tied for ninth at one-under-par, eight shots behind winner Dustin Johnson. McIlroy had promised to return the club to Trump after the round and was as good as his word. \"We\\'re thinking about auctioning it for charity or doing a trophy case for Doral, putting it on a beautiful mount,\" Trump said. Johnson is looking set to be one of McIlroy\\'s main rivals in the first major of the season, the U.S. Masters at Augusta, next month and his victory completed a triumphant comeback to the PGA Tour. The 30-year-old American took a six-month break from the Tour last July to cope with \"personal problems\" and returned earlier this year. Johnson finished with a three-under',\n",
" 'It raises money for two Florida hospitals named for the seven-time major winner and his late wife Winnie. \"I am so proud of what has been accomplished at the hospitals over the past 25 years. It is always a privilege to know that we are making a difference in the lives of families throughout the community,\" said Palmer after his medical center was named one of the best for children in the U.S. for 2014-15. He hurt his shoulder in December after tripping on carpet when he was about to make a speech at a PGA Tour father/son event. World No. 1 Rory McIlroy will make his first appearance at Palmer\\'s March 19-22 tournament, which features a restricted field, while top-five players Bubba Watson, Henrik Stenson, Adam Scott and Jason Day will also take part. Like us on Facebook .',\n",
" '(CNN)With a little bit of help from Donald Trump, Rory McIlroy was re-united with the golf club he famously threw into the lake at Doral -- but probably wished the golf-loving tycoon had not bothered. Never one to miss a media opportunity, Trump, the owner of the Blue Monster course in Florida, got a scuba diver to retrieve the 3-iron club which world No. 1 McIlroy had thought he had seen the last of during Friday\\'s second round at the WGC-Cadillac Championship. The 68-year-old American entrepreneur presented it to McIlroy before his final round Sunday, telling him that it was unlucky to continue playing with 13 clubs as against the usual 14 allowed under golf\\'s rules. \"He\\'s never one to miss an opportunity,\" McIlroy told the official PGA Tour website after his round. \"It was fine. It was good fun.\" Not that opportunity knocked for McIlroy when he chose the 3-iron to play his third shot to the 18th and final hole of the tournament and promptly found the water again. The Northern',\n",
" '(CNN)It was an act of frustration perhaps more commonly associated with golf\\'s fictional anti-hero Happy Gilmore than the world\\'s reigning No 1. player. But when Rory McIlroy pulled his second shot on the eighth hole of the WGC Cadillac Championship into a lake Friday, he might as well have been channeling the much loved Adam Sandler character. Before continuing his round with a dropped ball, the four-time major winner launched the 3-iron used to play the offending shot into the water as well. \"(It) felt good at the time,\" a rueful McIlroy later said of the incident in comments carried by the PGA Tour website. \"I just let frustration get the better of me. It was heat of the moment, and I mean, if it had of been any other club I probably wouldn\\'t have but I didn\\'t need a 3‑iron for the rest of the round so I thought, why not.\" The club \"must have went a good 60, 70 yards,\" he joked. McIlroy composed himself to finish with a second round of 70, leaving him one-under for the tournament']}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What event is Rory McIlroy preparing for after the WGC-Cadillac Championship?\"\n",
"result = rag_system_ensemble.rag_chain.invoke(question)\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RAGAS Pipeline testing the rag_chain"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--LOADING EVALUATION DATA--\n",
"--GETTING CONTEXT AND ANSWERS--\n",
"--EVALUATING LOCALLY--\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a6274de6f8804585badb7a081b4ac730",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Evaluating: 0%| | 0/76 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--EVALUATION COMPLETE--\n"
]
}
],
"source": [
"rag_results = run_ragas_evaluation(rag_system_ensemble.rag_chain)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Save results to csv\n",
"rag_results.to_csv(f\"data/evaluation_results/bm_{optimization_no}_{optimization_name}.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "rag-optimization-cnn-dailymail-hiPg4Kip-py3.10",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 6b8b0e4

Please sign in to comment.