From 72f542a76c915d0705ff2df5e9611d12426ce7e8 Mon Sep 17 00:00:00 2001 From: Emmanuel Leroy Date: Wed, 16 Oct 2024 02:55:13 -0700 Subject: [PATCH] WMS 3427 OpenSearch Conversational Search improvements (#427) * Opensearch Conversational Search Demo app stack lab * more image alt text * fix image * more specific file names to avoid confusion * typo * WMS 3427 OpenSearch Conversational Search, fix typos and more details. * Improvement to lab 7 WMS 3427 OpenSearch Conversational Search * add copy button --- .../conversational-with-rag-demo-stack.md | 4 +- .../conversational-with-rag.md | 62 ++++++++++++++++++- 2 files changed, 62 insertions(+), 4 deletions(-) diff --git a/oci-opensearch/conversational-with-rag/conversational-with-rag-demo-stack.md b/oci-opensearch/conversational-with-rag/conversational-with-rag-demo-stack.md index 334723036..789460ceb 100644 --- a/oci-opensearch/conversational-with-rag/conversational-with-rag-demo-stack.md +++ b/oci-opensearch/conversational-with-rag/conversational-with-rag-demo-stack.md @@ -165,9 +165,9 @@ To learn more about how this works, you can now proceed to the next lab, which g There may be auth error 401 trying to authenticate to the cluster: this indicates that wrong credentials were provided. You can open the stack and look at variables. There you can choose to Edit variables and update the credentials, then re-apply the stack. - There may be 409 errors with rate limit exceeded. This is normal as long as the 500 dem data records get ingested. However if the startup process never finished, and the last logs don't indicate the app was started, something went wrong during ingestion. + There may be 409 errors with rate limit exceeded. This is normal as long as the 500 demo data records get ingested. However if the startup process never finishes, and the last logs don't indicate the app was started, something went wrong during ingestion. - Timeout during ingestion may indicate the VCN configuration is wrong. If the VN was created with the wizard, it should have a public nad private subnets, with an Internet Gateway on the public subnet, and a Service Gateway and NAT gateway in the private subnet. You should also have created a security list to open port 9200 for OpenSearch API. + Timeout during ingestion may indicate the VCN configuration is wrong. If the VN was created with the wizard, it should have a public and private subnets, with an Internet Gateway on the public subnet, and a Service Gateway and NAT gateway in the private subnet. You should also have created a security list to open port 9200 for OpenSearch API. If you can't figure what may be wrong, feel free to contact us with the contact button, and please provide the full app log in your email. diff --git a/oci-opensearch/conversational-with-rag/conversational-with-rag.md b/oci-opensearch/conversational-with-rag/conversational-with-rag.md index 5987cdccb..6cfdcbdd7 100644 --- a/oci-opensearch/conversational-with-rag/conversational-with-rag.md +++ b/oci-opensearch/conversational-with-rag/conversational-with-rag.md @@ -716,10 +716,68 @@ The user's input query and prompt fine-tuning. The user's previous conversation history based on the specified conversation ID. You can control how many previous conversation contexts to consider using the interaction_size parameter in the API call. You can also use the context_size to control how many of the retrieved top documents you want to parse to the LLM as context to augment the knowledge. +## Step 11: Further Improvements +There is a caveat to using vector search from the user input in a conversation setting: if the user prompt is a follow-up question, the LLM may have the context from history, but if the retriever needs this context, it does not get it as it only receives the user prompt. + +In the demo app from Lab 7a, where we are querying about operations on files from object storage, the user may ask a question about a given file, and then follow up with a question like 'who deleted it?'. Without the context of what 'it' is, the retriever will focus on the 'delete' operation part of the query, and the chance of retrieving the document related to the file in question is equal to the number of documents retrieved over the total number of delete operations; The more files and the more operations, the least likely it is to retrieve the right context for the LLM to answer the user question. + +To get around this, an additional step can be inserted before the RAG pipeline. + +Instead of passing the user prompt to the RAG pipeline directly, we'll first do a simple LLM call passing the chat history as context, asking the LLM to rephrase the question, filling in the missing information (in the example, the file name). Then use the re-phrased question as the input prompt to the RAG pipeline. + +In this case, the body of our request looks like: + +```json +{ + "size" : 100, + "query": { + "match": { + "text": "None" + } + }, + "ext": { + "generative_qa_parameters": { + "llm_model": "oci_genai/cohere.command-r-plus", + "llm_question": f"""You are helpful entity extrator who extracts the entity from chat history when needed. + ## Instructions: + - If there is any ambiguity in the question regarding the context use chat history to resolve that ambiguity. + - If there is any reference to "it", "this", "that" in query, replace the word with relevant entity e.g. file name extracted from Chat History. + - If there is no ambiguity in subject or object of the question return the question as is + - return the question only with replaced entitity + - refined_question is the Question with word "it", "this", "that" replaced with extracted entity from last index of Chat History + - never add any extra sentences to refined_question + - If there is no ambiguity in subject or object of the Question then refined_question is same as Question + - The response should me in following format: + ### Response: + + refined_question + + ## Question: + Who deleted it? + + ## Chat History: + When was file patient_1062 created? + File patient_1062 was create One August 20th 2023 + """, + "context_size": 100, + "interaction_size": 2, + "timeout": 60 + } + } + } +``` + +Note a few things: + +- The query uses "None" for the text, meaning we're not matching documents, just running the LLM call. +- The prompt is the user question, and chat history needs to be filled in. This is something you need to manage when writing your code with the SDK of your choice. +- We do not pass the conversationId in this call, as it would add the exchange to the LLM conversation history and maybe confuse it. This is an extra call made outside of the conversation with the user. + +With this extra step, you now have a system much more robust to handling human-like interactions. ## Acknowledgements -* **Author** - Landry Kezebou Yankam -* **Last Updated By/Date** - George Csaba, June 2024 +* **Author** - Landry Kezebou Yankam, George Csaba, Emmanuel Leroy +* **Last Updated By/Date** - Emmanuel Leroy, October 2024