diff --git a/fern/pages/tutorials/cohere-azure-ai-foundry.mdx b/fern/pages/tutorials/cohere-azure-ai-foundry.mdx index c6b373466..13e6887cc 100644 --- a/fern/pages/tutorials/cohere-azure-ai-foundry.mdx +++ b/fern/pages/tutorials/cohere-azure-ai-foundry.mdx @@ -52,6 +52,6 @@ Models that are offered by Cohere are billed through the Azure Marketplace. For ## Conclusion -This chapter introduced Azure AI Foundry, a fully managed service by Azure that you can deploy Cohere's models on. We also went through the steps to get set up with Azure AI Foundry and deploy a Cohere model. +This page introduces Azure AI Foundry, a fully managed service by Azure that you can deploy Cohere's models on. We also went through the steps to get set up with Azure AI Foundry and deploy a Cohere model. -In the coming sections, we will go through the various use cases of using Cohere's Command, Embed, and Rerank models on Azure AI Foundry. \ No newline at end of file +In the next sections, we will go through the various use cases of using Cohere's Command, Embed, and Rerank models on Azure AI Foundry. \ No newline at end of file diff --git a/fern/pages/v2/tutorials/cohere-azure-ai-foundry.mdx b/fern/pages/v2/tutorials/cohere-azure-ai-foundry.mdx new file mode 100644 index 000000000..d74ed8eaa --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-azure-ai-foundry.mdx @@ -0,0 +1,57 @@ +--- +title: Introduction to Cohere on Azure AI Foundry +slug: /v2/docs/cohere-on-azure/cohere-on-azure-ai-foundry + +description: "An introduction to Cohere on Azure AI Foundry, a fully managed service by Azure (API v2)." +image: "../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, Command models, Embed models, Rerank models, Azure AI Foundry" +--- + +## What is Azure AI Foundry + +Azure AI Foundry is a trusted platform that empowers developers to build and deploy innovative, responsible AI applications. It offers an enterprise-grade environment with cutting-edge tools and models, ensuring a safe and secure development process. + +The platform facilitates collaboration, allowing teams to work together on the full lifecycle of application development. With Azure AI Foundry, developers can explore a wide range of models, services, and capabilities to build AI applications that meet their specific goals. + +Hubs are the primary top-level Azure resource for AI Foundry. They provide a central way for a team to govern security, connectivity, and computing resources across playgrounds and projects. Once a hub is created, developers can create projects from it and access shared company resources without needing an IT administrator's repeated help. + +Your new project will be added under your current hub, which provides security, governance controls, and shared configurations that all projects can use. Project workspaces that are created using a hub inherit the same security settings and shared resource access. Teams can create project workspaces as needed to organize their work, isolate data, and/or restrict access. + +## Azure AI Foundry Features + +- Build generative AI applications on an enterprise-grade platform. +- Explore, build, test, and deploy using cutting-edge AI tools and ML models, grounded in responsible AI practices. +- Collaborate with a team for the full life-cycle of application development. +- Improve your application's performance using tools like tracing to debug your application or compare evaluations to hone in on how you want your application to behave. +- Safegaurd every layer with trustworthy AI from the start and protect against any risks. + +With AI Foundry, you can explore a wide variety of models, services and capabilities, and get to building AI applications that best serve your goals. + +## Cohere Models on Azure AI Foundry + +To get the most updated list of available models, visit the [Azure AI Foundry documentation here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-cohere-command?tabs=cohere-command-r-plus-08-2024&pivots=programming-language-python). + +## Pricing Mechanisms + +Cohere models can be deployed to serverless API endpoints with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. + +To get the most updated list of available models, visit the [Azure marketplace here](https://azuremarketplace.microsoft.com/en-us/marketplace/apps?page=1&search=cohere). + +## Deploying Cohere's Models on Azure AI Foundry. + +To deploy Cohere's models on Azure AI Foundry, follow the steps described in [Azure AI Foundry documentation here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +In summary, you will need to: + +1. Set up AI Foundry Hub and a project +2. Find your model and model ID in the model catalog +3. Subscribe your project to the model offering +4. Deploy the model to a serverless API endpoint + +Models that are offered by Cohere are billed through the Azure Marketplace. For such models, you're required to subscribe your project to the particular model offering. + +## Conclusion + +This page introduces Azure AI Foundry, a fully managed service by Azure that you can deploy Cohere's models on. We also went through the steps to get set up with Azure AI Foundry and deploy a Cohere model. + +In the next sections, we will go through the various use cases of using Cohere's Command, Embed, and Rerank models on Azure AI Foundry. \ No newline at end of file diff --git a/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-rag.mdx b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-rag.mdx new file mode 100644 index 000000000..d7c47653c --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-rag.mdx @@ -0,0 +1,467 @@ +--- +title: Retrieval Augmented Generation (RAG) +slug: /v2/docs/cohere-on-azure/azure-ai-rag + +description: "A guide for performing retrieval augmented generation (RAG) with Cohere's Command models on Azure AI Foundry (API v2)." +image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, RAG, retrieval augmented generation, chatbot, Command models, Azure AI Foundry" +--- +[Open in GitHub](https://github.com/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/cohere-on-azure/v2/azure-ai-rag.ipynb) + +Large Language Models (LLMs) excel at generating text and maintaining conversational context in chat applications. However, LLMs can sometimes hallucinate - producing responses that are factually incorrect. This is particularly important to mitigate in enterprise environments where organizations work with proprietary information that wasn't part of the model's training data. + +Retrieval-augmented generation (RAG) addresses this limitation by enabling LLMs to incorporate external knowledge sources into their response generation process. By grounding responses in retrieved facts, RAG significantly reduces hallucinations and improves the accuracy and reliability of the model's outputs. + +In this tutorial, we'll cover: +- Setting up the Cohere client +- Building a RAG application by combining retrieval and chat capabilities +- Managing chat history and maintaining conversational context +- Handling direct responses vs responses requiring retrieval +- Generating citations for retrieved information + +In the next tutorial, we'll explore how to leverage Cohere's tool use features to build agentic applications. + +We'll use Cohere's Command, Embed, and Rerank models deployed on Azure. + +## Setup + +First, you will need to deploy the Command, Embed, and Rerank models on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client. + +To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model. + + +```python PYTHON +# %pip install cohere hnswlib unstructured + +import cohere + +co_chat = cohere.ClientV2( + api_key="AZURE_API_KEY_CHAT", + base_url="AZURE_ENDPOINT_CHAT" # example: "https://cohere-command-r-plus-08-2024-xyz.eastus.models.ai.azure.com/" +) + +co_embed = cohere.ClientV2( + api_key="AZURE_API_KEY_EMBED", + base_url="AZURE_ENDPOINT_EMBED" # example: "https://cohere-embed-v3-multilingual-xyz.eastus.models.ai.azure.com/" +) + +co_rerank = cohere.ClientV2( + api_key="AZURE_API_KEY_RERANK", + base_url="AZURE_ENDPOINT_RERANK" # example: "https://cohere-rerank-v3-multilingual-xyz.eastus.models.ai.azure.com/" +) +``` + +## A quick example + +Let's begin with a simple example to explore how RAG works. + +The foundation of RAG is having a set of documents for the LLM to reference. Below, we'll work with a small collection of basic documents. While RAG systems usually involve retrieving relevant documents based on the user's query (which we'll explore later), for now we'll keep it simple and use this entire small set of documents as context for the LLM. + +We have seen how to use the Chat endpoint in the text generation chapter. To use the RAG feature, we simply need to add one additional parameter, `documents`, to the endpoint call. These are the documents we want to provide as the context for the model to use in its response. + + +```python PYTHON +documents = [ + { + "title": "Tall penguins", + "text": "Emperor penguins are the tallest.", + }, + { + "title": "Penguin habitats", + "text": "Emperor penguins only live in Antarctica.", + }, + { + "title": "What are animals?", + "text": "Animals are different from plants.", + }, +] +``` + +Let's see how the model responds to the question "What are the tallest living penguins?" + +The model leverages the provided documents as context for its response. Specifically, when mentioning that Emperor penguins are the tallest species, it references `doc_0` - the document which states that "Emperor penguins are the tallest." + + +```python PYTHON +message = "What are the tallest living penguins?" + +response = co_chat.chat( + model="model", # Pass a dummy string + messages=[{"role": "user", "content": message}], + documents=[{"data": doc} for doc in documents] +) + +print("\nRESPONSE:\n") +print(response.message.content[0].text) + +if response.message.citations: + print("\nCITATIONS:\n") + for citation in response.message.citations: + print(citation) +``` + +```mdx +RESPONSE: + +The tallest living penguins are the Emperor penguins. They only live in Antarctica. + +CITATIONS: + +start=36 end=53 text='Emperor penguins.' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type=None +start=59 end=83 text='only live in Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type=None +``` + +## A more comprehensive example + +Now that we’ve covered a basic RAG implementation, let’s look at a more comprehensive example of RAG that includes: + +- Creating a retrieval system that converts documents into text embeddings and stores them in an index +- Building a query generation system that transforms user messages into optimized search queries +- Implementing a chat interface to handle LLM interactions with users +- Designing a response generation system capable of handling various query types + +First, let’s import the necessary libraries for this project. This includes `hnswlib` for the vector library and `unstructured` for chunking the documents (more details on these later). + + + +```python PYTHON +import uuid +import yaml +import hnswlib +from typing import List, Dict +from unstructured.partition.html import partition_html +from unstructured.chunking.title import chunk_by_title +``` + +## Define documents + +Next, we’ll define the documents we’ll use for RAG. We’ll use a few pages from the Cohere documentation that discuss prompt engineering. Each entry is identified by its title and URL. + + +```python PYTHON +raw_documents = [ + { + "title": "Crafting Effective Prompts", + "url": "https://docs.cohere.com/docs/crafting-effective-prompts", + }, + { + "title": "Advanced Prompt Engineering Techniques", + "url": "https://docs.cohere.com/docs/advanced-prompt-engineering-techniques", + }, + { + "title": "Prompt Truncation", + "url": "https://docs.cohere.com/docs/prompt-truncation", + }, + { + "title": "Preambles", + "url": "https://docs.cohere.com/docs/preambles", + }, +] +``` + +## Create vectorstore + +The Vectorstore class handles the ingestion of documents into embeddings (or vectors) and the retrieval of relevant documents given a query. + +It includes a few methods: + +- `load_and_chunk`: Loads the raw documents from the URL and breaks them into smaller chunks +- `embed`: Generates embeddings of the chunked documents +- `index`: Indexes the document chunk embeddings to ensure efficient similarity search during retrieval +- `retrieve`: Uses semantic search to retrieve relevant document chunks from the index, given a query. It involves two steps: first, dense retrieval from the index via the Embed endpoint, and second, a reranking via the Rerank endpoint to boost the search results further. + + +```python PYTHON +class Vectorstore: + + def __init__(self, raw_documents: List[Dict[str, str]]): + self.raw_documents = raw_documents + self.docs = [] + self.docs_embs = [] + self.retrieve_top_k = 10 + self.rerank_top_k = 3 + self.load_and_chunk() + self.embed() + self.index() + + + def load_and_chunk(self) -> None: + """ + Loads the text from the sources and chunks the HTML content. + """ + print("Loading documents...") + + for raw_document in self.raw_documents: + elements = partition_html(url=raw_document["url"]) + chunks = chunk_by_title(elements) + for chunk in chunks: + self.docs.append( + { + "data": { + "title": raw_document["title"], + "text": str(chunk), + "url": raw_document["url"], + } + } + ) + + def embed(self) -> None: + """ + Embeds the document chunks using the Cohere API. + """ + print("Embedding document chunks...") + + batch_size = 90 + self.docs_len = len(self.docs) + for i in range(0, self.docs_len, batch_size): + batch = self.docs[i : min(i + batch_size, self.docs_len)] + texts = [item["data"]["text"] for item in batch] + docs_embs_batch = co_embed.embed( + texts=texts, + model="embed-multilingual-v3.0", + input_type="search_document", + embedding_types=["float"] + ).embeddings.float + self.docs_embs.extend(docs_embs_batch) + + def index(self) -> None: + """ + Indexes the document chunks for efficient retrieval. + """ + print("Indexing document chunks...") + + self.idx = hnswlib.Index(space="ip", dim=1024) + self.idx.init_index(max_elements=self.docs_len, ef_construction=512, M=64) + self.idx.add_items(self.docs_embs, list(range(len(self.docs_embs)))) + + print(f"Indexing complete with {self.idx.get_current_count()} document chunks.") + + def retrieve(self, query: str) -> List[Dict[str, str]]: + """ + Retrieves document chunks based on the given query. + + Parameters: + query (str): The query to retrieve document chunks for. + + Returns: + List[Dict[str, str]]: A list of dictionaries representing the retrieved document chunks, with 'title', 'text', and 'url' keys. + """ + + # Dense retrieval + query_emb = co_embed.embed( + texts=[query], + model="embed-multilingual-v3.0", + input_type="search_query", + embedding_types=["float"] + ).embeddings.float + + doc_ids = self.idx.knn_query(query_emb, k=self.retrieve_top_k)[0][0] + + # Reranking + docs_to_rerank = [self.docs[doc_id]["data"] for doc_id in doc_ids] + yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in docs_to_rerank] + rerank_results = co_rerank.rerank( + query=query, + documents=yaml_docs, + model="model", # Pass a dummy string + top_n=self.rerank_top_k + ) + + doc_ids_reranked = [doc_ids[result.index] for result in rerank_results.results] + + docs_retrieved = [] + for doc_id in doc_ids_reranked: + docs_retrieved.append(self.docs[doc_id]["data"]) + + return docs_retrieved +``` + +## Process documents + +With the Vectorstore set up, we can process the documents, which will involve chunking, embedding, and indexing. + + +```python PYTHON +# Create an instance of the Vectorstore class with the given sources +vectorstore = Vectorstore(raw_documents) +``` +```mdx +Loading documents... +Embedding document chunks... +Indexing document chunks... +Indexing complete with 137 document chunks. +``` + +We can test if the retrieval is working by entering a search query. + + +```python PYTHON +vectorstore.retrieve("Prompting by giving examples") +``` + +```mdx +[{'title': 'Advanced Prompt Engineering Techniques', + 'text': 'Few-shot Prompting\n\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', + 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'}, + {'title': 'Crafting Effective Prompts', + 'text': 'Incorporating Example Outputs\n\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.', + 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'}, + {'title': 'Advanced Prompt Engineering Techniques', + 'text': 'In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. Ordering the examples can also be important; if there are patterns that could be picked up on that are not relevant to the correctness of the question, the model may incorrectly pick up on those instead of the semantics of the question itself.', + 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'}] +``` + + +## Run chatbot + +We can now run the chatbot. For this, we create a `run_chatbot` function that accepts the user message and the history of the conversation, if available. + +```python PYTHON +def run_chatbot(query, messages=None): + if messages is None: + messages = [] + + messages.append({"role": "user", "content": query}) + + # Retrieve document chunks and format + documents = vectorstore.retrieve(query) + documents_formatted = [] + for doc in documents: + documents_formatted.append({ + "data": doc + }) + + # Use document chunks to respond + response = co_chat.chat( + model="model", # Pass a dummy string + messages=messages, + documents=documents_formatted + ) + + # Print the chatbot response, citations, and documents + print("\nRESPONSE:\n") + print(response.message.content[0].text) + + if response.message.citations: + print("\nCITATIONS:\n") + for citation in response.message.citations: + print("-"*20) + print("start:", citation.start, "end:", citation.end, "text:", citation.text) + print("SOURCES:") + print(citation.sources) + + # Add assistant response to messages + messages.append({ + "role": "assistant", + "content": response.message.content[0].text + }) + + return messages +``` + +Here is a sample conversation consisting of a few turns. + + +```python PYTHON +messages = run_chatbot("Hello, I have a question")``` + +```mdx +RESPONSE: + +Hello there! How can I help you today? +``` + + +```python PYTHON +messages = run_chatbot("How to provide examples in prompts", messages) +``` +``` +RESPONSE: + +There are a few ways to provide examples in prompts. + +One way is to provide a few relevant and diverse examples in the prompt. This can help steer the LLM towards a high-quality solution. Good examples condition the model to the expected response type and style. + +Another way is to provide specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like. + +In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. + +CITATIONS: + +-------------------- +start: 68 end: 126 text: provide a few relevant and diverse examples in the prompt. +SOURCES: +[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\n\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})] +-------------------- +start: 136 end: 187 text: help steer the LLM towards a high-quality solution. +SOURCES: +[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\n\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})] +-------------------- +start: 188 end: 262 text: Good examples condition the model to the expected response type and style. +SOURCES: +[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\n\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})] +-------------------- +start: 282 end: 321 text: provide specific examples to work from. +SOURCES: +[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Incorporating Example Outputs\n\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.', 'title': 'Crafting Effective Prompts', 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'})] +-------------------- +start: 335 end: 485 text: instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like. +SOURCES: +[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Incorporating Example Outputs\n\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.', 'title': 'Crafting Effective Prompts', 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'})] +-------------------- +start: 527 end: 679 text: including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. +SOURCES: +[DocumentSource(type='document', id='doc:2', document={'id': 'doc:2', 'text': 'In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. Ordering the examples can also be important; if there are patterns that could be picked up on that are not relevant to the correctness of the question, the model may incorrectly pick up on those instead of the semantics of the question itself.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})] +``` + + +```python PYTHON +messages = run_chatbot("What do you know about 5G networks?", messages) +``` +```mdx +RESPONSE: + +I'm sorry, I could not find any information about 5G networks. +``` + + +```python PYTHON +for message in messages: + print(message, "\n") +``` +```mdx +{'role': 'user', 'content': 'Hello, I have a question'} + +{'role': 'assistant', 'content': 'Hello! How can I help you today?'} + +{'role': 'user', 'content': 'How to provide examples in prompts'} + +{'role': 'assistant', 'content': 'There are a few ways to provide examples in prompts.\n\nOne way is to provide a few relevant and diverse examples in the prompt. This can help steer the LLM towards a high-quality solution. Good examples condition the model to the expected response type and style.\n\nAnother way is to provide specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.\n\nIn addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses.'} + +{'role': 'user', 'content': 'What do you know about 5G networks?'} + +{'role': 'assistant', 'content': "I'm sorry, I could not find any information about 5G networks."} + +``` + +There are a few observations worth pointing out: + +- Direct response: For user messages that don’t require retrieval (“Hello, I have a question”), the chatbot responds directly without requiring retrieval. +- Citation generation: For responses that do require retrieval ("What's the difference between zero-shot and few-shot prompting"), the endpoint returns the response together with the citations. These are fine-grained citations, which means they refer to specific spans of the generated text. +- Response synthesis: The model can decide if none of the retrieved documents provide the necessary information to answer a user message. For example, when asked the question, “What do you know about 5G networks”, the chatbot retrieves external information from the index. However, it doesn’t use any of the information in its response as none of it is relevant to the question. + + +## Conclusion + +In this tutorial, we learned about: +- How to set up the Cohere client to use the Command model deployed on Azure AI Foundry for chat +- How to build a RAG application by combining retrieval and chat capabilities +- How to manage chat history and maintain conversational context +- How to handle direct responses vs responses requiring retrieval +- How citations are automatically generated for retrieved information + +In the next tutorial, we'll explore how to leverage Cohere's tool use features to build agentic applications. + + diff --git a/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-reranking.mdx b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-reranking.mdx new file mode 100644 index 000000000..0774dbfa5 --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-reranking.mdx @@ -0,0 +1,227 @@ +--- +title: Reranking +slug: /v2/docs/cohere-on-azure/azure-ai-reranking + +description: "A guide for performing reranking with Cohere's Reranking models on Azure AI Foundry (API v2)." +image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, reranking, semantic search, Rerank models, Azure AI Foundry" +--- +[Open in GitHub](https://github.com/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/cohere-on-azure/v2/azure-ai-reranking.ipynb) + +In this tutorial, we'll explore reranking using Cohere's Rerank model on Azure AI Foundry. + +Reranking is a crucial technique used in information retrieval systems, particularly for large-scale search applications. The process involves taking an initial set of retrieved documents and reordering them based on how relevant they are to the user's search query. + +One of the most compelling aspects of reranking is its ease of implementation - despite providing substantial improvements to search results, Cohere's Rerank models can be integrated into any existing search system with just a single line of code, regardless of whether it uses semantic or traditional keyword-based search approaches. + +In this tutorial, we'll cover: +- Setting up the Cohere client +- Retrieving documents +- Reranking documents +- Reranking semi structured data + +We'll use Cohere's Embed model deployed on Azure to demonstrate these capabilities and help you understand how to effectively implement semantic search in your applications. + +## Setup + +First, you will need to deploy the Rerank model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +In the example below, we are deploying the Rerank Multilingual v3 model. + +Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client. + +To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model. + + +```python PYTHON +# %pip install cohere + +import cohere + +co = cohere.ClientV2( + api_key="AZURE_API_KEY_RERANK", + base_url="AZURE_ENDPOINT_RERANK" # example: "https://cohere-rerank-v3-multilingual-xyz.eastus.models.ai.azure.com/" +) + +``` + +## Retrieve documents + +For this example, we'll work with documents that have already been retrieved through an initial search stage (which could be semantic search, keyword matching, or another retrieval method). + +Below is a list of nine documents representing the initial search results. Each document contains email data structured as a dictionary with two fields - Title and Content. This semi-structured format allows the Rerank endpoint to effectively process and reorder the results based on relevance. + + +```python PYTHON +documents = [ + { + "Title": "Incorrect Password", + "Content": "Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?", + }, + { + "Title": "Confirmation Email Missed", + "Content": "Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?", + }, + { + "Title": "Questions about Return Policy", + "Content": "Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.", + }, + { + "Title": "Customer Support is Busy", + "Content": "Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?", + }, + { + "Title": "Received Wrong Item", + "Content": "Hi, I have a question about my recent order. I received the wrong item and I need to return it.", + }, + { + "Title": "Customer Service is Unavailable", + "Content": "Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?", + }, + { + "Title": "Return Policy for Defective Product", + "Content": "Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.", + }, + { + "Title": "Wrong Item Received", + "Content": "Good morning, I have a question about my recent order. I received the wrong item and I need to return it.", + }, + { + "Title": "Return Defective Product", + "Content": "Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.", + }, +] +``` + +## Rerank documents + +Adding a reranking component is simple with Cohere Rerank. It takes just one line of code to implement. + +Calling the Rerank endpoint requires the following arguments: + +- `documents`: The list of documents, which we defined in the previous section +- `query`: The user query; we’ll use 'What emails have been about refunds?' as an example +- `top_n`: The number of documents we want to be returned, sorted from the most to the least relevant document + +When passing documents that contain multiple fields like in this case, for best performance we recommend formatting them as YAML strings. + + +```python PYTHON +import yaml + +yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in documents] + +query = 'What emails have been about refunds?' + +results = co.rerank( + model="model", # Pass a dummy string + documents=yaml_docs, + query=query, + top_n=3 +) +``` + +Since we set `top_n=3`, the response will return the three documents most relevant to our query. Each result includes both the document's original position (index) in our input list and a score indicating how well it matches the query. + +Let's examine the reranked results below. + +```python PYTHON +def return_results(results, documents): + for idx, result in enumerate(results.results): + print(f"Rank: {idx+1}") + print(f"Score: {result.relevance_score}") + print(f"Document: {documents[result.index]}\n") + +return_results(results, documents) +``` +```mdx +Rank: 1 +Score: 8.547617e-05 +Document: {'Title': 'Return Defective Product', 'Content': 'Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'} + +Rank: 2 +Score: 5.1442214e-05 +Document: {'Title': 'Questions about Return Policy', 'Content': 'Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'} + +Rank: 3 +Score: 3.591301e-05 +Document: {'Title': 'Return Policy for Defective Product', 'Content': 'Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'} +``` + + +The search query was looking for emails about refunds. But none of the documents mention the word “refunds” specifically. + +However, the Rerank model was able to retrieve the right documents. Some of the documents mentioned the word “return”, which has a very similar meaning to "refunds." + +## Rerank semi structured data + +The Rerank 3 model supports multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables. By setting the rank fields, you can select which fields the model should consider for reranking. + +In the following example, we’ll use an email data example. It is a semi-stuctured data that contains a number of fields – from, to, date, subject, and text. + +The model will rerank based on order of the fields passed. + +```python PYTHON +# Define the documents +emails = [ + { + "from": "hr@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "A Warm Welcome to Co1t!", + "text": "We are delighted to welcome you to the team! As you embark on your journey with us, you'll find attached an agenda to guide you through your first week.", + }, + { + "from": "it@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "Setting Up Your IT Needs", + "text": "Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.", + }, + { + "from": "john@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "First Week Check-In", + "text": "Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!", + }, +] + +yaml_emails = [yaml.dump(doc, sort_keys=False) for doc in emails] +``` + + +```python PYTHON +# Add the user query +query = "Any email about check ins?" + +# Rerank the documents +results = co.rerank( + model="model", # Pass a dummy string + query=query, + documents=yaml_emails, + top_n=2, +) + +return_results(results, emails) +``` +```mdx +Rank: 1 +Score: 0.13477592 +Document: {'from': 'john@co1t.com', 'to': 'david@co1t.com', 'date': '2024-06-24', 'subject': 'First Week Check-In', 'text': "Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!"} + +Rank: 2 +Score: 0.0010083435 +Document: {'from': 'it@co1t.com', 'to': 'david@co1t.com', 'date': '2024-06-24', 'subject': 'Setting Up Your IT Needs', 'text': 'Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.'} +``` + + +## Summary + +In this tutorial, we learned about: +- How to set up the Cohere client to use the Rerank model deployed on Azure AI Foundry +- How to retrieve documents +- How to rerank documents +- How to rerank semi structured data + +In the next tutorial, we'll learn how to build RAG applications by leveraging the models that we've looked at in the previous tutorials - Command, Embed, and Rerank. \ No newline at end of file diff --git a/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-sem-search.mdx b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-sem-search.mdx new file mode 100644 index 000000000..265aef0d0 --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-sem-search.mdx @@ -0,0 +1,226 @@ +--- +title: Semantic Search +slug: /v2/docs/cohere-on-azure/azure-ai-sem-search + +description: "A guide for performing text semantic search with Cohere's Embed models on Azure AI Foundry (API v2)." +image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, semantic search, text embeddings, Embed models, Azure AI Foundry" +--- +[Open in GitHub](https://github.com/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/cohere-on-azure/v2/azure-ai-sem-search.ipynb) + +In this tutorial, we'll explore semantic search using Cohere's Embed modelon Azure AI Foundry. + +Semantic search enables search systems to capture the meaning and context of search queries, going beyond simple keyword matching to find relevant results based on semantic similarity. + +With the Embed model, you can do this across languages. This is particularly powerful for multilingual applications where the same meaning can be expressed in different languages. + +In this tutorial, we'll cover: +- Setting up the Cohere client +- Embedding text data +- Building a search index +- Performing semantic search queries + +We'll use Cohere's Embed model deployed on Azure to demonstrate these capabilities and help you understand how to effectively implement semantic search in your applications. + + +## Setup + +First, you will need to deploy the Embed model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +In the example below, we are deploying the Embed Multilingual v3 model. + +Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client. + +To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model. + + +```python PYTHON +# %pip install cohere hnswlib + +import pandas as pd +import hnswlib +import re +import cohere + +co = cohere.ClientV2( + api_key="AZURE_API_KEY_EMBED", + base_url="AZURE_ENDPOINT_EMBED" # example: "https://cohere-embed-v3-multilingual-xyz.eastus.models.ai.azure.com/" +) +``` + +## Download dataset + +For this example, we'll be using [MultiFIN](https://aclanthology.org/2023.findings-eacl.66.pdf) - an open-source dataset of financial article headlines in 15 different languages (including English, Turkish, Danish, Spanish, Polish, Greek, Finnish, Hebrew, Japanese, Hungarian, Norwegian, Russian, Italian, Icelandic, and Swedish). + +We've prepared a CSV version of the MultiFIN dataset that includes an additional column containing English translations. While we won't use these translations for the model itself, they'll help us understand the results when we encounter headlines in Danish or Spanish. We'll load this CSV file into a pandas dataframe. + + +```python PYTHON +url = "https://raw.githubusercontent.com/cohere-ai/cohere-aws/main/notebooks/bedrock/multiFIN_train.csv" +df = pd.read_csv(url) + +# Inspect dataset +df.head(5) +``` + +## Pre-Process Dataset + +For this example, we'll work with a subset focusing on English, Spanish, and Danish content. + +We'll perform several pre-processing steps: removing any duplicate entries, filtering to keep only our three target languages, and selecting the 80 longest articles as our working dataset. + + +```python PYTHON +# Ensure there is no duplicated text in the headers +def remove_duplicates(text): + return re.sub( + r"((\b\w+\b.{1,2}\w+\b)+).+\1", r"\1", text, flags=re.I + ) + + +df["text"] = df["text"].apply(remove_duplicates) + +# Keep only selected languages +languages = ["English", "Spanish", "Danish"] +df = df.loc[df["lang"].isin(languages)] + +# Pick the top 80 longest articles +df["text_length"] = df["text"].str.len() +df.sort_values(by=["text_length"], ascending=False, inplace=True) +top_80_df = df[:80] + +# Language distribution +top_80_df["lang"].value_counts() +``` + + +```mdx +lang +Spanish 33 +English 29 +Danish 18 +Name: count, dtype: int64 +``` + + +## Embed and index documents + +Let's embed our documents and store the embeddings. These embeddings are high-dimensional vectors (1,024 dimensions) that capture the semantic meaning of each document. We'll use Cohere's embed-multilingual-v3.0 model that we have defined in the client setup. + +The v3.0 embedding models require us to specify an `input_type` parameter that indicates what we're embedding. For semantic search, we use `search_document` for the documents we want to search through, and `search_query` for the search queries we'll make later. + +We'll also keep track information about each document's language and translation to provide richer search results. + +Finally, we'll build a search index with the `hnsw` vector library to store these embeddings efficiently, enabling faster document searches. + + +```python PYTHON +# Embed documents +# Embed documents +docs = top_80_df['text'].to_list() +docs_lang = top_80_df['lang'].to_list() +translated_docs = top_80_df['translation'].to_list() #for reference when returning non-English results +doc_embs = co.embed( + model="embed-multilingual-v3.0", + texts=docs, + input_type='search_document', + embedding_types=["float"] +).embeddings.float + +# Create a search index +index = hnswlib.Index(space='ip', dim=1024) +index.init_index(max_elements=len(doc_embs), ef_construction=512, M=64) +index.add_items(doc_embs, list(range(len(doc_embs)))) +``` + +## Send Query and Retrieve Documents + +Next, we build a function that takes a query as input, embeds it, and finds the three documents that are the most similar to the query. + + +```python PYTHON +# Retrieval of 4 closest docs to query +def retrieval(query): + # Embed query and retrieve results + query_emb = co.embed( + model="embed-multilingual-v3.0", # Pass a dummy string + texts=[query], + input_type='search_query', + embedding_types=["float"] + ).embeddings.float + + doc_ids = index.knn_query(query_emb, k=3)[0][0] # we will retrieve 3 closest neighbors + + # Print and append results + print(f"QUERY: {query.upper()} \n") + retrieved_docs, translated_retrieved_docs = [], [] + + for doc_id in doc_ids: + # Append results + retrieved_docs.append(docs[doc_id]) + translated_retrieved_docs.append(translated_docs[doc_id]) + + # Print results + print(f"ORIGINAL ({docs_lang[doc_id]}): {docs[doc_id]}") + if docs_lang[doc_id] != "English": + print(f"TRANSLATION: {translated_docs[doc_id]} \n----") + else: + print("----") + print("END OF RESULTS \n\n") + return retrieved_docs, translated_retrieved_docs +``` + +Let’s now try to query the index with a couple of examples, one each in English and Danish. + + +```python PYTHON +queries = [ + "Can data science help meet sustainability goals?", # English example + "Hvor kan jeg finde den seneste danske boligplan?", # Danish example - "Where can I find the latest Danish property plan?" +] + +for query in queries: + retrieval(query) +``` +```mdx +QUERY: CAN DATA SCIENCE HELP MEET SUSTAINABILITY GOALS? + +ORIGINAL (English): Using AI to better manage the environment could reduce greenhouse gas emissions, boost global GDP by up to 38m jobs by 2030 +---- +ORIGINAL (English): Quality of business reporting on the Sustainable Development Goals improves, but has a long way to go to meet and drive targets. +---- +ORIGINAL (English): Only 10 years to achieve Sustainable Development Goals but businesses remain on starting blocks for integration and progress +---- +END OF RESULTS + + +QUERY: HVOR KAN JEG FINDE DEN SENESTE DANSKE BOLIGPLAN? + +ORIGINAL (Danish): Nyt fra CFOdirect: Ny PP&E-guide, FAQs om den nye leasingstandard, podcast om udfordringerne ved implementering af leasingstandarden og meget mere +TRANSLATION: New from CFOdirect: New PP&E guide, FAQs on the new leasing standard, podcast on the challenges of implementing the leasing standard and much more +---- +ORIGINAL (Danish): Lovforslag fremlagt om rentefri lån, udskudt frist for lønsumsafgift, førtidig udbetaling af skattekredit og loft på indestående på skattekontoen +TRANSLATION: Bills presented on interest -free loans, deferred deadline for payroll tax, early payment of tax credit and ceiling on the balance in the tax account +---- +ORIGINAL (Danish): Nyt fra CFOdirect: Shareholder-spørgsmål til ledelsen, SEC cybersikkerhedsguide, den amerikanske skattereform og meget mere +TRANSLATION: New from CFOdirect: Shareholder questions for management, the SEC cybersecurity guide, US tax reform and more +---- +END OF RESULTS +``` + + + +With the first example, notice how the retrieval system was able to surface documents similar in meaning, for example, surfacing documents related to AI when given a query about data science. This is something that keyword-based search will not be able to capture. + +As for the second example, this demonstrates the multilingual nature of the model. You can use the same model across different languages. The model can also perform cross-lingual search, such as the example of from the first retrieved document, where “PP&E guide” is an English term that stands for “property, plant, and equipment,”. + +## Summary + +In this tutorial, we learned about: +- How to set up the Cohere client to use the Embed model deployed on Azure AI Foundry +- How to embed text data +- How to build a search index +- How to perform multilingualsemantic search + +In the next tutorial, we'll explore how to use the Rerank model for reranking search results. + diff --git a/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-text-generation.mdx b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-text-generation.mdx new file mode 100644 index 000000000..6b88bab21 --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-text-generation.mdx @@ -0,0 +1,345 @@ +--- +title: Text Generation +slug: v2/docs/cohere-on-azure/azure-ai-text-generation + +description: "A guide for performing text generation with Cohere's Command models on Azure AI Foundry (API v2)." +image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, text generation, chatbot, Command models, Azure AI Foundry" +--- + +[Open in GitHub](https://github.com/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/cohere-on-azure/v2/azure-ai-text-generation.ipynb) + +In this tutorial, we'll explore text generation using Cohere's Command model on Azure AI Foundry. + +Text generation is a fundamental capability that enables LLMs to generate text for various applications, such as providing detailed responses to questions, helping with writing and editing tasks, creating conversational responses, and assisting with code generation and documentation. + +In this tutorial, we'll cover: +- Setting up the Cohere client +- Basic text generation +- Other typical use cases +- Building a chatbot + +We'll use Cohere's Command model deployed on Azure to demonstrate these capabilities and help you understand how to effectively use text generation in your applications. + +## Setup + +First, you will need to deploy the Command model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +In the example below, we are deploying the Command R+ (August 2024) model. + +Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client. + +To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model. + + +```python PYTHON +# %pip install cohere +import cohere + +co = cohere.ClientV2( + api_key="AZURE_API_KEY_CHAT", + base_url="AZURE_ENDPOINT_CHAT" # example: "https://cohere-command-r-plus-08-2024-xyz.eastus.models.ai.azure.com/" +) +``` + +## Creating some contextual information + +Before we begin, let's create some context to use in our text generation tasks. In this example, we'll use a set of technical support frequently asked questions (FAQs) as our context. + + +```python PYTHON +# Technical support FAQ +faq_tech_support = """- Question: How do I set up my new smartphone with my mobile plan? +- Answer: + - Insert your SIM card into the device. + - Turn on your phone and follow the on-screen setup instructions. + - Connect to your mobile network and enter your account details when prompted. + - Download and install any necessary apps or updates. + - Contact customer support if you need further assistance. + +- Question: My internet connection is slow. How can I improve my mobile data speed? +- Answer: + - Check your signal strength and move to an area with better coverage. + - Restart your device and try connecting again. + - Ensure your data plan is active and has sufficient data. + - Consider upgrading your plan for faster speeds. + +- Question: I can't connect to my mobile network. What should I do? +- Answer: + - Check your SIM card is inserted correctly and not damaged. + - Restart your device and try connecting again. + - Ensure your account is active and not suspended. + - Check for any network outages in your area. + - Contact customer support for further assistance. + +- Question: How do I set up my voicemail? +- Answer: + - Dial your voicemail access number (usually provided by your carrier). + - Follow the prompts to set up your voicemail greeting and password. + - Record your voicemail greeting and save it. + - Test your voicemail by calling your number and leaving a message. + +- Question: I'm having trouble sending text messages. What could be the issue? +- Answer: + - Check your signal strength and move to an area with better coverage. + - Ensure your account has sufficient credit or an active plan. + - Restart your device and try sending a message again. + - Check your message settings and ensure they are correct. + - Contact customer support if the issue persists.""" +``` + +## Helper function to generate text + +Now, let's define a function to generate text using the Command R+ model on Bedrock. We’ll use this function a few times throughout. + +This function takes a user message and generates the response via the chat endpoint. Note that we don't need to specify the model as we have already set it up in the client. + +```python PYTHON +def generate_text(message): + response = co.chat( + model="model", # Pass a dummy string + messages=[{"role": "user", "content": message}]) + return response +``` + +## Text generation + +Let's explore basic text generation as our first use case. The model takes a prompt as input and produces a relevant response as output. + +Consider a scenario where a customer support agent uses an LLM to help draft responses to customer inquiries. The agent provides technical support FAQs as context along with the customer's question. The prompt is structured to include three components: the instruction, the context (FAQs), and the specific customer inquiry. + +After passing this prompt to our `generate_text` function, we receive a response object. The actual generated text can be accessed through the `response.text` attribute. + + +```python PYTHON +inquiry = "I've noticed some fluctuations in my mobile network's performance recently. The connection seems stable most of the time, but every now and then, I experience brief periods of slow data speeds. It happens a few times a day and is quite inconvenient." + +prompt = f"""Use the FAQs below to provide a concise response to this customer inquiry. + +# Customer inquiry +{inquiry} + +# FAQs +{faq_tech_support}""" + +response = generate_text(prompt) + +print(response.message.content[0].text) +``` +``` mdx +It's quite common to experience occasional fluctuations in mobile network performance, and there are a few steps you can take to address this issue. + +First, check your signal strength and consider moving to a different location with better coverage. Sometimes, even a small change in position can make a difference. If you find that you're in an area with low signal strength, this could be the primary reason for the slow data speeds. + +Next, try restarting your device. A simple restart can often resolve temporary glitches and improve your connection. After restarting, ensure that your data plan is active and has enough data allocated for your usage. If you're close to reaching your data limit, this could also impact your speeds. + +If the issue persists, it might be worth checking for any network outages in your area. Occasionally, temporary network issues can cause intermittent slowdowns. Contact your mobile network's customer support to inquire about any known issues and to receive further guidance. + +Additionally, consider the age and condition of your device. Older devices or those with outdated software might struggle to maintain consistent data speeds. Ensuring your device is up-to-date and well-maintained can contribute to a better overall network experience. + +If the problem continues, you may want to explore the option of upgrading your data plan. Higher-tier plans often offer faster speeds and more reliable connections, especially during peak usage times. Contact your mobile provider to discuss the available options and find a plan that better suits your needs. +``` + +## Text summarization + +Another type of use case is text summarization. Now, let's summarize the customer inquiry into a single sentence. We add an instruction to the prompt and then pass the inquiry to the prompt. + + +```python PYTHON +prompt = f"""Summarize this customer inquiry into one short sentence. + +Inquiry: {inquiry}""" + +response = generate_text(prompt) + +print(response.message.content[0].text) +``` +```mdx +A customer is experiencing intermittent slow data speeds on their mobile network several times a day. +``` + +## Text rewriting + +Text rewriting is a powerful capability that allows us to adapt content for different purposes while preserving the core message. This involves transforming the style, tone, or format of text to better suit the target audience or medium. + +Let's look at an example where we convert a customer support chat response into a formal email. We'll construct the prompt by first stating our goal to rewrite the text, then providing the original chat response as context. + + +```python PYTHON +prompt = f"""Rewrite this customer support agent response into an email format, ready to send to the customer. + +If you're experiencing brief periods of slow data speeds or difficulty sending text messages and connecting to your mobile network, here are some troubleshooting steps you can follow: + +1. Check your signal strength - Move to an area with better coverage. +2. Restart your device and try connecting again. +3. Ensure your account is active and not suspended. +4. Contact customer support for further assistance. (This can include updating your plan for better network performance.) + +Did these steps help resolve the issue? Let me know if you need further assistance.""" + +response = generate_text(prompt) + +print(response.message.content[0].text) +``` +```mdx +Subject: Troubleshooting Slow Data Speeds and Network Connection Issues + +Dear [Customer's Name], + +I hope this email finds you well. I understand that you may be facing some challenges with your mobile network, including slow data speeds and difficulties sending text messages. Here are some recommended troubleshooting steps to help resolve these issues: + +- Signal Strength: Check the signal strength on your device and move to a different location if the signal is weak. Moving to an area with better coverage can often improve your connection. + +- Restart Your Device: Sometimes, a simple restart can resolve temporary glitches. Please restart your device and then try connecting to the network again. + +- Account Status: Verify that your account is active and in good standing. In some cases, service providers may temporarily suspend accounts due to various reasons, which can impact your network access. + +- Contact Customer Support: If the issue persists, please reach out to our customer support team for further assistance. Our team can help troubleshoot and provide additional guidance. We can also discuss your current plan and explore options to enhance your network performance if needed. + +I hope these steps will help resolve the issue promptly. Please feel free to reply to this email if you have any further questions or if the problem continues. We are committed to ensuring your satisfaction and providing a seamless network experience. + +Best regards, +[Your Name] +[Customer Support Agent] +[Company Name] +``` + +## Build a Chatbot + +While our previous examples were single-turn interactions, the Chat endpoint enables us to create chatbots that maintain memory of past conversation turns. This capability allows developers to build conversational applications that preserve context throughout the dialogue. + +Below, we implement a basic customer support chatbot that acts as a helpful service agent. We'll create a function called run_chatbot that handles the conversation flow and displays messages and events. The function can take an optional chat history parameter to maintain conversational context across multiple turns. + +For this, we introduce a couple of additional parameters to the Chat endpoint: + +- `preamble`: A preamble contains instructions to help steer a chatbot’s response toward specific characteristics, such as a persona, style, or format. Here, we are using a simple preamble: “You are a helpful customer support agent that assists customers of a mobile network service.” +- `chat_history`: We store the history of a conversation between a user and the chatbot as a list, append every new conversation turn, and pass this information to the next endpoint call. + + +```python PYTHON +# Define a preamble +system_message = """## Task and Context +You are a helpful customer support agent that assists customers of a mobile network service.""" + +# Run the chatbot +def run_chatbot(message, messages=None): + if messages is None: + messages = [] + + if "system" not in {m.get("role") for m in messages}: + messages.append({"role": "system", "content": system_message}) + + messages.append({"role": "user", "content": message}) + + response = co.chat( + model="model", # Pass a dummy string + messages=messages, + ) + + messages.append({"role": "assistant", "content": response.message.content[0].text}) + + print(response.message.content[0].text) + + return messages +``` + + +```python PYTHON +messages = run_chatbot("Hi. I've noticed some fluctuations in my mobile network's performance recently.") +``` +```mdx +Hello there! I'd be happy to assist you with this issue. Network performance fluctuations can be concerning, and it's important to identify the cause to ensure you have a smooth experience. + +Can you tell me more about the problems you've been experiencing? Are there specific times or locations where the network seems to perform poorly? Any details you can provide will help me understand the situation better and offer potential solutions. +``` + + +```python PYTHON +messages = run_chatbot("At times, the data speed is very poor. What should I do?", messages) +``` +```mdx +I'm sorry to hear that you're experiencing slow data speeds. Here are some troubleshooting steps and tips to help improve your network performance: + +- **Check Network Coverage:** First, ensure that you are in an area with good network coverage. You can check the coverage map provided by your mobile network service on their website. If you're in a location with known weak signal strength, moving to a different area might improve your data speed. + +- **Restart Your Device:** Sometimes, a simple restart of your mobile device can help refresh the network connection. Power off your device, wait for a few moments, and then turn it back on. + +- **Check for Network Updates:** Make sure your device is running the latest software and carrier settings. Updates often include improvements and optimizations for network performance. You can check for updates in your device's settings. + +- **Manage Network Settings:** + - *Network Mode:* Try switching to a different network mode (e.g., 4G/LTE, 3G) to see if a specific network band provides better speed. + - *Airplane Mode:* Toggle Airplane mode on and off to reconnect to the network. + - *Network Reset:* If the issue persists, you can try resetting your network settings, but note that this will erase saved Wi-Fi passwords. + +- **Contact Customer Support:** If the problem continues, it might be beneficial to contact your mobile network's customer support team. They can check for any known issues in your area and provide further assistance. They might also guide you through advanced troubleshooting steps. + +- **Consider Network Congestion:** Slow data speeds can sometimes occur during peak usage hours when the network is congested. Try using data-intensive apps during off-peak hours to see if that makes a difference. + +- **Check Background Apps:** Certain apps running in the background can consume data and impact speed. Close any unnecessary apps to free up resources. + +If the slow data speed persists despite these measures, it's advisable to reach out to your mobile network provider for further investigation and assistance. They can provide network-specific solutions and ensure you're getting the service you expect. +``` + + +```python PYTHON +messages = run_chatbot("Thanks. What else can I check?", messages) +``` +```mdx +ou're welcome! Here are some additional steps and factors to consider: + +- **Device Health:** Ensure your device is in good working condition. An older device or one with hardware issues might struggle to maintain a fast data connection. Consider checking for any pending system updates that could optimize your device's performance. + +- **SIM Card:** Try removing and reinserting your SIM card to ensure it is properly seated. A loose connection can impact network performance. If the issue persists, it might be worth asking your network provider for a SIM replacement. + +- **Network Congestion at Specific Times:** Network speed can vary depending on the time of day. If possible, monitor your data speed during different parts of the day to identify any patterns. This can help determine if network congestion during peak hours is the primary cause. + +- **Data Plan and Throttling:** Check your mobile data plan to ensure you haven't exceeded any data limits, which could result in reduced speeds. Some providers throttle speeds after a certain data threshold is reached. + +- **Background Updates and Downloads:** Certain apps might be set to update or download content in the background, consuming data and potentially slowing down your connection. Review your app settings and consider disabling automatic updates or background data usage for apps that don't require real-time updates. + +- **Network Diagnostics Tools:** Some mobile devices have built-in network diagnostics tools that can provide insights into your connection. These tools can help identify issues with signal strength, network latency, and more. + +- **Wi-Fi Calling and Data Usage:** If your device supports Wi-Fi calling, ensure it is enabled. This can offload some data usage from the cellular network, potentially improving speeds. + +- **Network Provider's App:** Download and install your mobile network provider's official app, if available. These apps often provide real-time network status updates and allow you to report issues directly. + +If you've gone through these checks and the problem persists, contacting your network provider's technical support team is the next best step. They can provide further guidance based on your specific situation. +``` + +### View the chat history + +Here's what is contained in the chat history after a few turns. + + +```python PYTHON +print("Chat history:") +for message in messages: + print(message, "\n") +``` +```mdx +Chat history: +{'role': 'system', 'content': '## Task and Context\nYou are a helpful customer support agent that assists customers of a mobile network service.'} + +{'role': 'user', 'content': "Hi. I've noticed some fluctuations in my mobile network's performance recently."} + +{'role': 'assistant', 'content': "Hello there! I'd be happy to assist you with this issue. Network performance fluctuations can be concerning, and it's important to identify the cause to ensure you have a smooth experience. \n\nCan you tell me more about the problems you've been experiencing? Are there specific times or locations where the network seems to perform poorly? Any details you can provide will help me understand the situation better and offer potential solutions."} + +{'role': 'user', 'content': 'At times, the data speed is very poor. What should I do?'} + +{'role': 'assistant', 'content': "I'm sorry to hear that you're experiencing slow data speeds. Here are some troubleshooting steps and tips to help improve your network performance:\n\n- **Check Network Coverage:** First, ensure that you are in an area with good network coverage. You can check the coverage map provided by your mobile network service on their website. If you're in a location with known weak signal strength, moving to a different area might improve your data speed.\n\n- **Restart Your Device:** Sometimes, a simple restart of your mobile device can help refresh the network connection. Power off your device, wait for a few moments, and then turn it back on.\n\n- **Check for Network Updates:** Make sure your device is running the latest software and carrier settings. Updates often include improvements and optimizations for network performance. You can check for updates in your device's settings.\n\n- **Manage Network Settings:**\n - *Network Mode:* Try switching to a different network mode (e.g., 4G/LTE, 3G) to see if a specific network band provides better speed.\n - *Airplane Mode:* Toggle Airplane mode on and off to reconnect to the network.\n - *Network Reset:* If the issue persists, you can try resetting your network settings, but note that this will erase saved Wi-Fi passwords.\n\n- **Contact Customer Support:** If the problem continues, it might be beneficial to contact your mobile network's customer support team. They can check for any known issues in your area and provide further assistance. They might also guide you through advanced troubleshooting steps.\n\n- **Consider Network Congestion:** Slow data speeds can sometimes occur during peak usage hours when the network is congested. Try using data-intensive apps during off-peak hours to see if that makes a difference.\n\n- **Check Background Apps:** Certain apps running in the background can consume data and impact speed. Close any unnecessary apps to free up resources.\n\nIf the slow data speed persists despite these measures, it's advisable to reach out to your mobile network provider for further investigation and assistance. They can provide network-specific solutions and ensure you're getting the service you expect."} + +{'role': 'user', 'content': 'Thanks. What else can I check?'} + +{'role': 'assistant', 'content': "You're welcome! Here are some additional steps and factors to consider:\n\n- **Device Health:** Ensure your device is in good working condition. An older device or one with hardware issues might struggle to maintain a fast data connection. Consider checking for any pending system updates that could optimize your device's performance.\n\n- **SIM Card:** Try removing and reinserting your SIM card to ensure it is properly seated. A loose connection can impact network performance. If the issue persists, it might be worth asking your network provider for a SIM replacement.\n\n- **Network Congestion at Specific Times:** Network speed can vary depending on the time of day. If possible, monitor your data speed during different parts of the day to identify any patterns. This can help determine if network congestion during peak hours is the primary cause.\n\n- **Data Plan and Throttling:** Check your mobile data plan to ensure you haven't exceeded any data limits, which could result in reduced speeds. Some providers throttle speeds after a certain data threshold is reached.\n\n- **Background Updates and Downloads:** Certain apps might be set to update or download content in the background, consuming data and potentially slowing down your connection. Review your app settings and consider disabling automatic updates or background data usage for apps that don't require real-time updates.\n\n- **Network Diagnostics Tools:** Some mobile devices have built-in network diagnostics tools that can provide insights into your connection. These tools can help identify issues with signal strength, network latency, and more.\n\n- **Wi-Fi Calling and Data Usage:** If your device supports Wi-Fi calling, ensure it is enabled. This can offload some data usage from the cellular network, potentially improving speeds.\n\n- **Network Provider's App:** Download and install your mobile network provider's official app, if available. These apps often provide real-time network status updates and allow you to report issues directly.\n\nIf you've gone through these checks and the problem persists, contacting your network provider's technical support team is the next best step. They can provide further guidance based on your specific situation."} +``` + + +## Summary + +In this tutorial, we learned about: +- How to set up the Cohere client to use the Command model deployed on Azure AI Foundry +- How to perform basic text generation +- How to use the model for other types of use cases +- How to build a chatbot using the Chat endpoint + +In the next tutorial, we'll explore how to use the Embed model in semantic search applications. \ No newline at end of file diff --git a/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-tool-use.mdx b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-tool-use.mdx new file mode 100644 index 000000000..f37e14264 --- /dev/null +++ b/fern/pages/v2/tutorials/cohere-on-azure/azure-ai-tool-use.mdx @@ -0,0 +1,311 @@ +--- +title: Tool Use & Agents +slug: /v2/docs/cohere-on-azure/azure-ai-tool-use + +description: "A guide for using tool use and building agents with Cohere's Command models on Azure AI Foundry (API v2)." +image: "../../../../assets/images/f1cc130-cohere_meta_image.jpg" +keywords: "Cohere, tool use, agents, chatbot, Command models, Azure AI Foundry" +--- +[Open in GitHub](https://github.com/cohere-ai/cohere-developer-experience/blob/main/notebooks/guides/cohere-on-azure/v2/azure-ai-tool-use.ipynb) + +Tool use enhances retrieval-augmented generation (RAG) capabilities by enabling applications to both answer questions and automate tasks. + +Tools provide a broader access to external systems compared to traditional RAG. This approach leverages LLMs' inherent ability to reason and make decisions. By incorporating tools, developers can create agent-like applications that interact with external systems through both read and write operations. + +In this chapter, we'll explore how to build an agentic application by building an agent that can answer questions and automate tasks, enabled by a number of tools. + +## Setup + +First, you will need to deploy the Command model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio). + +In the example below, we are deploying the Command R+ (August 2024) model. + +Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client. + +To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model. + + +```python PYTHON +# %pip install cohere +import cohere + +co = cohere.ClientV2( + api_key="AZURE_API_KEY_CHAT", + base_url="AZURE_ENDPOINT_CHAT" # example: "https://cohere-command-r-plus-08-2024-xyz.eastus.models.ai.azure.com/" +) +``` + +## Create tools + +The pre-requisite, before we can run a tool use workflow, is to set up the tools. Let's create three tools: + +- `search_faqs`: A tool for searching the FAQs of a company. For simplicity, we'll not implement any retrieval logic, but we'll simply pass a list of three predefined documents. In practice, we would set up a retrieval system as we did in Chapters 4, 5, and 6. +- `search_emails`: A tool for searching the emails. Same as above, we'll simply pass a list of predefined emails. +- `create_calendar_event`: A tool for creating new calendar events. Again, for simplicity, we'll only return mock successful event creations without actual implementation. In practice, we can connect to a calendar service API and implement all the necessary logic here. + +Here, we are defining a Python function for each tool, but more broadly, the tool can be any function or service that can receive and send objects. + + +```python PYTHON +def search_faqs(query): + faqs = [ + { + "text": "Submitting Travel Expenses:\nSubmit your expenses through our user-friendly finance tool." + }, + { + "text": "Side Projects Policy:\nWe encourage you to explore your passions! Just ensure there's no conflict of interest with our business." + }, + { + "text": "Wellness Benefits:\nTo promote a healthy lifestyle, we provide gym memberships, on-site yoga classes, and health insurance." + }, + ] + return faqs + + +def search_emails(query): + emails = [ + { + "from": "hr@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "A Warm Welcome to Co1t, David!", + "text": "We are delighted to have you on board. Please find attached your first week's agenda.", + }, + { + "from": "it@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "Instructions for IT Setup", + "text": "Welcome, David! To get you started, please follow the attached guide to set up your work accounts.", + }, + { + "from": "john@co1t.com", + "to": "david@co1t.com", + "date": "2024-06-24", + "subject": "First Week Check-In", + "text": "Hi David, let's chat briefly tomorrow to discuss your first week. Also, come join us for lunch this Thursday at noon to meet everyone!", + }, + ] + return emails + + +def create_calendar_event(date: str, time: str, duration: int): + # You can implement any logic here + return { + "is_success": True, + "message": f"Created a {duration} hour long event at {time} on {date}", + } + + +functions_map = { + "search_faqs": search_faqs, + "search_emails": search_emails, + "create_calendar_event": create_calendar_event, +} +``` + +## Define tool schemas + +The next step is to define the tool schemas in a format that can be accepted by the Chat endpoint. The schema must contain the following fields: `name`, `description`, and `parameter_definitions`. + +This schema informs the LLM about what the tool does, and the LLM decides whether to use a particular tool based on it. Therefore, the more descriptive and specific the schema, the more likely the LLM will make the right tool call decisions. + + +```python PYTHON +tools = [ + { + "type": "function", + "function": { + "name": "search_faqs", + "description": "Given a user query, searches a company's frequently asked questions (FAQs) list and returns the most relevant matches to the query.", + "parameters": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "The query from the user", + } + }, + "required": ["query"], + }, + }, + }, + { + "type": "function", + "function": { + "name": "search_emails", + "description": "Given a user query, searches a person's emails and returns the most relevant matches to the query.", + "parameters": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "The query from the user", + } + }, + "required": ["query"], + }, + }, + }, + { + "type": "function", + "function": { + "name": "create_calendar_event", + "description": "Creates a new calendar event of the specified duration at the specified time and date. A new event cannot be created on the same time as an existing event.", + "parameters": { + "type": "object", + "properties": { + "date": { + "type": "string", + "description": "the date on which the event starts, formatted as mm/dd/yy", + }, + "time": { + "type": "string", + "description": "the time of the event, formatted using 24h military time formatting", + }, + "duration": { + "type": "number", + "description": "the number of hours the event lasts for", + }, + }, + "required": ["date", "time", "duration"], + }, + }, + }, +] +``` + +## Run agent + +Now, let's set up the agent using Cohere's tool use feature. We can think of a tool use system as consisting of four components: + +- The user +- The application +- The LLM +- The tools + +At its most basic, these four components interact in a workflow through four steps: + +- Step 1: Get user message. The LLM gets the user message (via the application). +- Step 2: Generate tool calls. The LLM makes a decision on the tools to call (if any) and generates the tool calls. +- Step 3: Get tool results. The application executes the tools and sends the tool results to the LLM. +- Step 4: Generate response and citations. The LLM generates the response and citations and sends them back to the user. + +Let's create a function called `run_assistant` to implement these steps and print out the key events and messages along the way. This function also optionally accepts the chat history as an argument to keep the state in a multi-turn conversation. + + +```python PYTHON +import json + +system_message="""## Task and Context +You are an assistant who assists new employees of Co1t with their first week. You respond to their questions and assist them with their needs. Today is Monday, June 24, 2024""" + +def run_assistant(query, messages=None): + if messages is None: + messages = [] + + if "system" not in {m.get("role") for m in messages}: + messages.append({"role": "system", "content": system_message}) + + # Step 1: get user message + print(f"Question:\n{query}") + print("="*50) + + messages.append({"role": "user", "content": query}) + + # Step 2: Generate tool calls (if any) + response = co.chat( + model="model", # Pass a dummy string + messages=messages, + tools=tools + ) + + while response.message.tool_calls: + + print("Tool plan:") + print(response.message.tool_plan,"\n") + print("Tool calls:") + for tc in response.message.tool_calls: + print(f"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}") + print("="*50) + + messages.append({"role": "assistant", "tool_calls": response.message.tool_calls, "tool_plan": response.message.tool_plan}) + + # Step 3: Get tool results + for idx, tc in enumerate(response.message.tool_calls): + tool_result = functions_map[tc.function.name]( + **json.loads(tc.function.arguments) + ) + tool_content = [] + for data in tool_result: + tool_content.append({"type": "document", "document": {"data": json.dumps(data)}}) + # Optional: add an "id" field in the "document" object, otherwise IDs are auto-generated + messages.append( + {"role": "tool", "tool_call_id": tc.id, "content": tool_content} + ) + + # Step 4: Generate response and citations + response = co.chat( + model="model", # Pass a dummy string + messages=messages, + tools=tools + ) + + messages.append({"role": "assistant", "content": response.message.content[0].text}) + + # Print final response + print("Response:") + print(response.message.content[0].text) + print("="*50) + + # Print citations (if any) + if response.message.citations: + print("\nCITATIONS:") + for citation in response.message.citations: + print(citation, "\n") + + return messages +``` + +Let’s now run the agent. We'll use an example of a new hire asking about IT access and the travel expense process. + +Given three tools to choose from, the model is able to pick the right tools (in this case, `search_faqs` and `search_emails`) based on what the user is asking for. + +Also, notice that the model first generates a plan about what it should do ("I will ...") before actually generating the tool call(s). + +Additionally, the model also generates fine-grained citations in tool use mode based on the tool results it receives, the same way we saw with RAG. + + +```python PYTHON +messages = run_assistant("Any doc on how do I submit travel expenses? Also, any emails about setting up IT access?") +``` +```mdx +Question: +Any doc on how do I submit travel expenses? Also, any emails about setting up IT access? +================================================== +Tool plan: +I will search for a document on how to submit travel expenses, and also search for emails about setting up IT access. + +Tool calls: +Tool name: search_faqs | Parameters: {"query":"how to submit travel expenses"} +Tool name: search_emails | Parameters: {"query":"setting up IT access"} +================================================== +Response: +You can submit your travel expenses through the user-friendly finance tool. + +You should have received an email from it@co1t.com with instructions for setting up your IT access. +================================================== + +CITATIONS: +start=48 end=75 text='user-friendly finance tool.' sources=[ToolSource(type='tool', id='search_faqs_wkfggn2680c4:0', tool_output={'text': 'Submitting Travel Expenses:\nSubmit your expenses through our user-friendly finance tool.'})] type='TEXT_CONTENT' + +start=105 end=176 text='email from it@co1t.com with instructions for setting up your IT access.' sources=[ToolSource(type='tool', id='search_emails_8n0cvsh5xknt:1', tool_output={'date': '2024-06-24', 'from': 'it@co1t.com', 'subject': 'Instructions for IT Setup', 'text': 'Welcome, David! To get you started, please follow the attached guide to set up your work accounts.', 'to': 'david@co1t.com'})] type='TEXT_CONTENT' +``` + +## Conclusion + +In this tutorial, we learned about: +- How to set up tools with parameter definitions for the Cohere chat API +- How to define tools for building agentic applications +- How to set up the agent +- How to run a tool use workflow involving the user, the application, the LLM, and the tools \ No newline at end of file diff --git a/fern/v2.yml b/fern/v2.yml index 19b92c3dc..e56fdc0df 100644 --- a/fern/v2.yml +++ b/fern/v2.yml @@ -289,6 +289,19 @@ navigation: path: pages/v2/tutorials/agentic-rag/querying-structured-data-tables.mdx - page: Querying Structured Data (SQL) path: pages/v2/tutorials/agentic-rag/querying-structured-data-sql.mdx + - section: Cohere on Azure + path: pages/v2/tutorials/cohere-azure-ai-foundry.mdx + contents: + - page: Text Generation + path: pages/v2/tutorials/cohere-on-azure/azure-ai-text-generation.mdx + - page: Semantic Search + path: pages/v2/tutorials/cohere-on-azure/azure-ai-sem-search.mdx + - page: Reranking + path: pages/v2/tutorials/cohere-on-azure/azure-ai-reranking.mdx + - page: Retrieval Augmented Generation (RAG) + path: pages/v2/tutorials/cohere-on-azure/azure-ai-rag.mdx + - page: Tool Use & Agents + path: pages/v2/tutorials/cohere-on-azure/azure-ai-tool-use.mdx - section: Responsible Use contents: - link: Security diff --git a/notebooks/guides/cohere-on-azure/v2/azure-ai-rag.ipynb b/notebooks/guides/cohere-on-azure/v2/azure-ai-rag.ipynb new file mode 100644 index 000000000..973968fae --- /dev/null +++ b/notebooks/guides/cohere-on-azure/v2/azure-ai-rag.ipynb @@ -0,0 +1,682 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Retrieval Augmented Generation (RAG)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Large Language Models (LLMs) excel at generating text and maintaining conversational context in chat applications. However, LLMs can sometimes hallucinate - producing responses that are factually incorrect. This is particularly important to mitigate in enterprise environments where organizations work with proprietary information that wasn't part of the model's training data.\n", + "\n", + "Retrieval-augmented generation (RAG) addresses this limitation by enabling LLMs to incorporate external knowledge sources into their response generation process. By grounding responses in retrieved facts, RAG significantly reduces hallucinations and improves the accuracy and reliability of the model's outputs.\n", + "\n", + "In this tutorial, we'll cover:\n", + "- Setting up the Cohere client\n", + "- Building a RAG application by combining retrieval and chat capabilities\n", + "- Managing chat history and maintaining conversational context\n", + "- Handling direct responses vs responses requiring retrieval\n", + "- Generating citations for retrieved information\n", + "\n", + "In the next tutorial, we'll explore how to leverage Cohere's tool use features to build agentic applications.\n", + "\n", + "We'll use Cohere's Command, Embed, and Rerank models deployed on Azure." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, you will need to deploy the Command, Embed, and Rerank models on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio).\n", + "\n", + "Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client.\n", + "\n", + "To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install cohere hnswlib unstructured\n", + "\n", + "import cohere\n", + "\n", + "co_chat = cohere.ClientV2(\n", + " api_key=\"AZURE_API_KEY_CHAT\",\n", + " base_url=\"AZURE_ENDPOINT_CHAT\" # example: \"https://cohere-command-r-plus-08-2024-xyz.eastus.models.ai.azure.com/\"\n", + ")\n", + "\n", + "co_embed = cohere.ClientV2(\n", + " api_key=\"AZURE_API_KEY_EMBED\",\n", + " base_url=\"AZURE_ENDPOINT_EMBED\" # example: \"https://cohere-embed-v3-multilingual-xyz.eastus.models.ai.azure.com/\"\n", + ")\n", + "\n", + "co_rerank = cohere.ClientV2(\n", + " api_key=\"AZURE_API_KEY_RERANK\",\n", + " base_url=\"AZURE_ENDPOINT_RERANK\" # example: \"https://cohere-rerank-v3-multilingual-xyz.eastus.models.ai.azure.com/\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A quick example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's begin with a simple example to explore how RAG works.\n", + " \n", + "The foundation of RAG is having a set of documents for the LLM to reference. Below, we'll work with a small collection of basic documents. While RAG systems usually involve retrieving relevant documents based on the user's query (which we'll explore later), for now we'll keep it simple and use this entire small set of documents as context for the LLM.\n", + "\n", + "We have seen how to use the Chat endpoint in the text generation chapter. To use the RAG feature, we simply need to add one additional parameter, `documents`, to the endpoint call. These are the documents we want to provide as the context for the model to use in its response." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "documents = [\n", + " {\n", + " \"title\": \"Tall penguins\",\n", + " \"text\": \"Emperor penguins are the tallest.\"},\n", + " {\n", + " \"title\": \"Penguin habitats\",\n", + " \"text\": \"Emperor penguins only live in Antarctica.\"},\n", + " {\n", + " \"title\": \"What are animals?\",\n", + " \"text\": \"Animals are different from plants.\"}\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's see how the model responds to the question \"What are the tallest living penguins?\"\n", + "\n", + "The model leverages the provided documents as context for its response. Specifically, when mentioning that Emperor penguins are the tallest species, it references `doc_0` - the document which states that \"Emperor penguins are the tallest.\"" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RESPONSE:\n", + "\n", + "The tallest living penguins are the Emperor penguins. They only live in Antarctica.\n", + "\n", + "CITATIONS:\n", + "\n", + "start=36 end=53 text='Emperor penguins.' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Emperor penguins are the tallest.', 'title': 'Tall penguins'})] type=None\n", + "start=59 end=83 text='only live in Antarctica.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Emperor penguins only live in Antarctica.', 'title': 'Penguin habitats'})] type=None\n" + ] + } + ], + "source": [ + "message = \"What are the tallest living penguins?\"\n", + "\n", + "response = co_chat.chat(\n", + " model=\"model\", # Pass a dummy string\n", + " messages=[{\"role\": \"user\", \"content\": message}],\n", + " documents=[{\"data\": doc} for doc in documents]\n", + ")\n", + "\n", + "print(\"\\nRESPONSE:\\n\")\n", + "print(response.message.content[0].text)\n", + "\n", + "if response.message.citations:\n", + " print(\"\\nCITATIONS:\\n\") \n", + " for citation in response.message.citations:\n", + " print(citation)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A more comprehensive example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we’ve covered a basic RAG implementation, let’s look at a more comprehensive example of RAG that includes:\n", + "\n", + "- Creating a retrieval system that converts documents into text embeddings and stores them in an index\n", + "- Building a query generation system that transforms user messages into optimized search queries\n", + "- Implementing a chat interface to handle LLM interactions with users\n", + "- Designing a response generation system capable of handling various query types\n", + "\n", + "First, let’s import the necessary libraries for this project. This includes `hnswlib` for the vector library and `unstructured` for chunking the documents (more details on these later).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "import yaml\n", + "import hnswlib\n", + "from typing import List, Dict\n", + "from unstructured.partition.html import partition_html\n", + "from unstructured.chunking.title import chunk_by_title" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Define documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we’ll define the documents we’ll use for RAG. We’ll use a few pages from the Cohere documentation that discuss prompt engineering. Each entry is identified by its title and URL." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "raw_documents = [\n", + " {\n", + " \"title\": \"Crafting Effective Prompts\",\n", + " \"url\": \"https://docs.cohere.com/docs/crafting-effective-prompts\"},\n", + " {\n", + " \"title\": \"Advanced Prompt Engineering Techniques\",\n", + " \"url\": \"https://docs.cohere.com/docs/advanced-prompt-engineering-techniques\"},\n", + " {\n", + " \"title\": \"Prompt Truncation\",\n", + " \"url\": \"https://docs.cohere.com/docs/prompt-truncation\"},\n", + " {\n", + " \"title\": \"Preambles\",\n", + " \"url\": \"https://docs.cohere.com/docs/preambles\"}\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create vectorstore" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Vectorstore class handles the ingestion of documents into embeddings (or vectors) and the retrieval of relevant documents given a query.\n", + "\n", + "It includes a few methods:\n", + "\n", + "- `load_and_chunk`: Loads the raw documents from the URL and breaks them into smaller chunks\n", + "- `embed`: Generates embeddings of the chunked documents\n", + "- `index`: Indexes the document chunk embeddings to ensure efficient similarity search during retrieval\n", + "- `retrieve`: Uses semantic search to retrieve relevant document chunks from the index, given a query. It involves two steps: first, dense retrieval from the index via the Embed endpoint, and second, a reranking via the Rerank endpoint to boost the search results further." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "class Vectorstore:\n", + "\n", + " def __init__(self, raw_documents: List[Dict[str, str]]):\n", + " self.raw_documents = raw_documents\n", + " self.docs = []\n", + " self.docs_embs = []\n", + " self.retrieve_top_k = 10\n", + " self.rerank_top_k = 3\n", + " self.load_and_chunk()\n", + " self.embed()\n", + " self.index()\n", + "\n", + "\n", + " def load_and_chunk(self) -> None:\n", + " \"\"\"\n", + " Loads the text from the sources and chunks the HTML content.\n", + " \"\"\"\n", + " print(\"Loading documents...\")\n", + "\n", + " for raw_document in self.raw_documents:\n", + " elements = partition_html(url=raw_document[\"url\"])\n", + " chunks = chunk_by_title(elements)\n", + " for chunk in chunks:\n", + " self.docs.append(\n", + " {\n", + " \"data\": {\n", + " \"title\": raw_document[\"title\"],\n", + " \"text\": str(chunk),\n", + " \"url\": raw_document[\"url\"],\n", + " }\n", + " }\n", + " )\n", + "\n", + " def embed(self) -> None:\n", + " \"\"\"\n", + " Embeds the document chunks using the Cohere API.\n", + " \"\"\"\n", + " print(\"Embedding document chunks...\")\n", + "\n", + " batch_size = 90\n", + " self.docs_len = len(self.docs)\n", + " for i in range(0, self.docs_len, batch_size):\n", + " batch = self.docs[i : min(i + batch_size, self.docs_len)]\n", + " texts = [item[\"data\"][\"text\"] for item in batch]\n", + " docs_embs_batch = co_embed.embed(\n", + " texts=texts,\n", + " model=\"embed-multilingual-v3.0\",\n", + " input_type=\"search_document\",\n", + " embedding_types=[\"float\"]\n", + " ).embeddings.float\n", + " self.docs_embs.extend(docs_embs_batch)\n", + "\n", + " def index(self) -> None:\n", + " \"\"\"\n", + " Indexes the document chunks for efficient retrieval.\n", + " \"\"\"\n", + " print(\"Indexing document chunks...\")\n", + "\n", + " self.idx = hnswlib.Index(space=\"ip\", dim=1024)\n", + " self.idx.init_index(max_elements=self.docs_len, ef_construction=512, M=64)\n", + " self.idx.add_items(self.docs_embs, list(range(len(self.docs_embs))))\n", + "\n", + " print(f\"Indexing complete with {self.idx.get_current_count()} document chunks.\")\n", + "\n", + " def retrieve(self, query: str) -> List[Dict[str, str]]:\n", + " \"\"\"\n", + " Retrieves document chunks based on the given query.\n", + "\n", + " Parameters:\n", + " query (str): The query to retrieve document chunks for.\n", + "\n", + " Returns:\n", + " List[Dict[str, str]]: A list of dictionaries representing the retrieved document chunks, with 'title', 'text', and 'url' keys.\n", + " \"\"\"\n", + "\n", + " # Dense retrieval\n", + " query_emb = co_embed.embed(\n", + " texts=[query],\n", + " model=\"embed-multilingual-v3.0\",\n", + " input_type=\"search_query\",\n", + " embedding_types=[\"float\"]\n", + " ).embeddings.float\n", + " \n", + " doc_ids = self.idx.knn_query(query_emb, k=self.retrieve_top_k)[0][0]\n", + "\n", + " # Reranking\n", + " docs_to_rerank = [self.docs[doc_id][\"data\"] for doc_id in doc_ids]\n", + " yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in docs_to_rerank] \n", + " rerank_results = co_rerank.rerank(\n", + " query=query,\n", + " documents=yaml_docs,\n", + " model=\"model\", # Pass a dummy string\n", + " top_n=self.rerank_top_k\n", + " )\n", + "\n", + " doc_ids_reranked = [doc_ids[result.index] for result in rerank_results.results]\n", + "\n", + " docs_retrieved = []\n", + " for doc_id in doc_ids_reranked:\n", + " docs_retrieved.append(self.docs[doc_id][\"data\"])\n", + " \n", + " return docs_retrieved" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Process documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With the Vectorstore set up, we can process the documents, which will involve chunking, embedding, and indexing." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loading documents...\n", + "Embedding document chunks...\n", + "Indexing document chunks...\n", + "Indexing complete with 137 document chunks.\n" + ] + } + ], + "source": [ + "# Create an instance of the Vectorstore class with the given sources\n", + "vectorstore = Vectorstore(raw_documents)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can test if the retrieval is working by entering a search query." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'title': 'Advanced Prompt Engineering Techniques',\n", + " 'text': 'Few-shot Prompting\\n\\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.',\n", + " 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'},\n", + " {'title': 'Crafting Effective Prompts',\n", + " 'text': 'Incorporating Example Outputs\\n\\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.',\n", + " 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'},\n", + " {'title': 'Advanced Prompt Engineering Techniques',\n", + " 'text': 'In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. Ordering the examples can also be important; if there are patterns that could be picked up on that are not relevant to the correctness of the question, the model may incorrectly pick up on those instead of the semantics of the question itself.',\n", + " 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'}]" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vectorstore.retrieve(\"Prompting by giving examples\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run chatbot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now run the chatbot. For this, we create a `run_chatbot` function that accepts the user message and the `messages` list containing the conversation history, if available.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [], + "source": [ + "def run_chatbot(query, messages=None):\n", + " if messages is None:\n", + " messages = []\n", + "\n", + " messages.append({\"role\": \"user\", \"content\": query})\n", + "\n", + " # Retrieve document chunks and format\n", + " documents = vectorstore.retrieve(query)\n", + " documents_formatted = []\n", + " for doc in documents:\n", + " documents_formatted.append({\n", + " \"data\": doc\n", + " })\n", + "\n", + " # Use document chunks to respond\n", + " response = co_chat.chat(\n", + " model=\"model\", # Pass a dummy string\n", + " messages=messages,\n", + " documents=documents_formatted\n", + " )\n", + " \n", + " # Print the chatbot response, citations, and documents\n", + " print(\"\\nRESPONSE:\\n\")\n", + " print(response.message.content[0].text)\n", + " \n", + " if response.message.citations:\n", + " print(\"\\nCITATIONS:\\n\") \n", + " for citation in response.message.citations:\n", + " print(\"-\"*20)\n", + " print(\"start:\", citation.start, \"end:\", citation.end, \"text:\", citation.text)\n", + " print(\"SOURCES:\")\n", + " print(citation.sources)\n", + " \n", + " # Add assistant response to messages\n", + " messages.append({\n", + " \"role\": \"assistant\",\n", + " \"content\": response.message.content[0].text\n", + " })\n", + "\n", + " return messages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is a sample conversation consisting of a few turns." + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RESPONSE:\n", + "\n", + "Hello! How can I help you today?\n" + ] + } + ], + "source": [ + "messages = run_chatbot(\"Hello, I have a question\")" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RESPONSE:\n", + "\n", + "There are a few ways to provide examples in prompts.\n", + "\n", + "One way is to provide a few relevant and diverse examples in the prompt. This can help steer the LLM towards a high-quality solution. Good examples condition the model to the expected response type and style.\n", + "\n", + "Another way is to provide specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.\n", + "\n", + "In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses.\n", + "\n", + "CITATIONS:\n", + "\n", + "--------------------\n", + "start: 68 end: 126 text: provide a few relevant and diverse examples in the prompt.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\\n\\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})]\n", + "--------------------\n", + "start: 136 end: 187 text: help steer the LLM towards a high-quality solution.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\\n\\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})]\n", + "--------------------\n", + "start: 188 end: 262 text: Good examples condition the model to the expected response type and style.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Few-shot Prompting\\n\\nUnlike the zero-shot examples above, few-shot prompting is a technique that provides a model with examples of the task being performed before asking the specific question to be answered. We can steer the LLM toward a high-quality solution by providing a few relevant and diverse examples in the prompt. Good examples condition the model to the expected response type and style.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})]\n", + "--------------------\n", + "start: 282 end: 321 text: provide specific examples to work from.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Incorporating Example Outputs\\n\\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.', 'title': 'Crafting Effective Prompts', 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'})]\n", + "--------------------\n", + "start: 335 end: 485 text: instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Incorporating Example Outputs\\n\\nLLMs respond well when they have specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.', 'title': 'Crafting Effective Prompts', 'url': 'https://docs.cohere.com/docs/crafting-effective-prompts'})]\n", + "--------------------\n", + "start: 527 end: 679 text: including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses.\n", + "SOURCES:\n", + "[DocumentSource(type='document', id='doc:2', document={'id': 'doc:2', 'text': 'In addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses. Ordering the examples can also be important; if there are patterns that could be picked up on that are not relevant to the correctness of the question, the model may incorrectly pick up on those instead of the semantics of the question itself.', 'title': 'Advanced Prompt Engineering Techniques', 'url': 'https://docs.cohere.com/docs/advanced-prompt-engineering-techniques'})]\n" + ] + } + ], + "source": [ + "messages = run_chatbot(\"How to provide examples in prompts\", messages)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RESPONSE:\n", + "\n", + "I'm sorry, I could not find any information about 5G networks.\n" + ] + } + ], + "source": [ + "messages = run_chatbot(\"What do you know about 5G networks?\", messages)" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'role': 'user', 'content': 'Hello, I have a question'} \n", + "\n", + "{'role': 'assistant', 'content': 'Hello! How can I help you today?'} \n", + "\n", + "{'role': 'user', 'content': 'How to provide examples in prompts'} \n", + "\n", + "{'role': 'assistant', 'content': 'There are a few ways to provide examples in prompts.\\n\\nOne way is to provide a few relevant and diverse examples in the prompt. This can help steer the LLM towards a high-quality solution. Good examples condition the model to the expected response type and style.\\n\\nAnother way is to provide specific examples to work from. For example, instead of asking for the salient points of the text and using bullet points “where appropriate”, give an example of what the output should look like.\\n\\nIn addition to giving correct examples, including negative examples with a clear indication of why they are wrong can help the LLM learn to distinguish between correct and incorrect responses.'} \n", + "\n", + "{'role': 'user', 'content': 'What do you know about 5G networks?'} \n", + "\n", + "{'role': 'assistant', 'content': \"I'm sorry, I could not find any information about 5G networks.\"} \n", + "\n", + "==================================================\n" + ] + } + ], + "source": [ + "for message in messages:\n", + " print(message, \"\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are a few observations worth pointing out:\n", + "\n", + "- Direct response: For user messages that don’t require retrieval (“Hello, I have a question”), the chatbot responds directly without requiring retrieval.\n", + "- Citation generation: For responses that do require retrieval (\"What's the difference between zero-shot and few-shot prompting\"), the endpoint returns the response together with the citations. These are fine-grained citations, which means they refer to specific spans of the generated text.\n", + "- Response synthesis: The model can decide if none of the retrieved documents provide the necessary information to answer a user message. For example, when asked the question, “What do you know about 5G networks”, the chatbot retrieves external information from the index. However, it doesn’t use any of the information in its response as none of it is relevant to the question.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we learned about:\n", + "- How to set up the Cohere client to use the Command model deployed on Azure AI Foundry for chat\n", + "- How to build a RAG application by combining retrieval and chat capabilities\n", + "- How to manage chat history and maintain conversational context\n", + "- How to handle direct responses vs responses requiring retrieval\n", + "- How citations are automatically generated for retrieved information\n", + "\n", + "In the next tutorial, we'll explore how to leverage Cohere's tool use features to build agentic applications." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "base" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/guides/cohere-on-azure/v2/azure-ai-reranking.ipynb b/notebooks/guides/cohere-on-azure/v2/azure-ai-reranking.ipynb new file mode 100644 index 000000000..8d8a64e3e --- /dev/null +++ b/notebooks/guides/cohere-on-azure/v2/azure-ai-reranking.ipynb @@ -0,0 +1,322 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Reranking" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reranking is a crucial technique used in information retrieval systems, particularly for large-scale search applications. The process involves taking an initial set of retrieved documents and reordering them based on how relevant they are to the user's search query.\n", + "\n", + "One of the most compelling aspects of reranking is its ease of implementation - despite providing substantial improvements to search results, Cohere's Rerank models can be integrated into any existing search system with just a single line of code, regardless of whether it uses semantic or traditional keyword-based search approaches.\n", + "\n", + "In this tutorial, we'll cover:\n", + "- Setting up the Cohere client\n", + "- Retrieving documents\n", + "- Reranking documents\n", + "- Reranking semi structured data\n", + "\n", + "We'll use Cohere's Embed model deployed on Azure to demonstrate these capabilities and help you understand how to effectively implement semantic search in your applications." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, you will need to deploy the Rerank model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio).\n", + "\n", + "In the example below, we are deploying the Rerank Multilingual v3 model.\n", + "\n", + "Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client.\n", + "\n", + "To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install cohere\n", + "\n", + "import cohere\n", + "\n", + "co = cohere.ClientV2(\n", + " api_key=\"AZURE_API_KEY_RERANK\",\n", + " base_url=\"AZURE_ENDPOINT_RERANK\" # example: \"https://cohere-rerank-v3-multilingual-xyz.eastus.models.ai.azure.com/\"\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Retrieve documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this example, we'll work with documents that have already been retrieved through an initial search stage (which could be semantic search, keyword matching, or another retrieval method).\n", + "\n", + "Below is a list of nine documents representing the initial search results. Each document contains email data structured as a dictionary with two fields - Title and Content. This semi-structured format allows the Rerank endpoint to effectively process and reorder the results based on relevance." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "documents = [\n", + " {\"Title\":\"Incorrect Password\",\"Content\":\"Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?\"},\n", + " {\"Title\":\"Confirmation Email Missed\",\"Content\":\"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?\"},\n", + " {\"Title\":\"Questions about Return Policy\",\"Content\":\"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.\"},\n", + " {\"Title\":\"Customer Support is Busy\",\"Content\":\"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?\"},\n", + " {\"Title\":\"Received Wrong Item\",\"Content\":\"Hi, I have a question about my recent order. I received the wrong item and I need to return it.\"},\n", + " {\"Title\":\"Customer Service is Unavailable\",\"Content\":\"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?\"},\n", + " {\"Title\":\"Return Policy for Defective Product\",\"Content\":\"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.\"},\n", + " {\"Title\":\"Wrong Item Received\",\"Content\":\"Good morning, I have a question about my recent order. I received the wrong item and I need to return it.\"},\n", + " {\"Title\":\"Return Defective Product\",\"Content\":\"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.\"}\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Rerank documents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding a reranking component is simple with Cohere Rerank. It takes just one line of code to implement.\n", + "\n", + "Calling the Rerank endpoint requires the following arguments:\n", + "\n", + "- `documents`: The list of documents, which we defined in the previous section\n", + "- `query`: The user query; we’ll use 'What emails have been about refunds?' as an example\n", + "- `top_n`: The number of documents we want to be returned, sorted from the most to the least relevant document\n", + "\n", + "When passing documents that contain multiple fields like in this case, for best performance we recommend formatting them as YAML strings.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "\n", + "yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in documents] \n", + "\n", + "query = 'What emails have been about refunds?'\n", + "\n", + "results = co.rerank(\n", + " model=\"model\", # Pass a dummy string\n", + " documents=yaml_docs,\n", + " query=query,\n", + " top_n=3\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since we set `top_n=3`, the response will return the three documents most relevant to our query. Each result includes both the document's original position (index) in our input list and a score indicating how well it matches the query.\n", + "\n", + "Let's examine the reranked results below.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Rank: 1\n", + "Score: 8.481104e-05\n", + "Document: {'Title': 'Return Defective Product', 'Content': 'Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'}\n", + "\n", + "Rank: 2\n", + "Score: 5.1442214e-05\n", + "Document: {'Title': 'Questions about Return Policy', 'Content': 'Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'}\n", + "\n", + "Rank: 3\n", + "Score: 3.591301e-05\n", + "Document: {'Title': 'Return Policy for Defective Product', 'Content': 'Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.'}\n", + "\n" + ] + } + ], + "source": [ + "def return_results(results, documents): \n", + " for idx, result in enumerate(results.results):\n", + " print(f\"Rank: {idx+1}\") \n", + " print(f\"Score: {result.relevance_score}\")\n", + " print(f\"Document: {documents[result.index]}\\n\")\n", + " \n", + "return_results(results, documents)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The search query was looking for emails about refunds. But none of the documents mention the word “refunds” specifically.\n", + "\n", + "However, the Rerank model was able to retrieve the right documents. Some of the documents mentioned the word “return”, which has a very similar meaning to \"refunds.\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Rerank semi structured data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Rerank 3 model supports multi-aspect and semi-structured data like emails, invoices, JSON documents, code, and tables. By setting the rank fields, you can select which fields the model should consider for reranking.\n", + "\n", + "In the following example, we’ll use an email data example. It is a semi-stuctured data that contains a number of fields – from, to, date, subject, and text.\n", + "\n", + "The model will rerank based on order of the fields passed." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the documents\n", + "emails = [\n", + " {\n", + " \"from\": \"hr@co1t.com\",\n", + " \"to\": \"david@co1t.com\",\n", + " \"date\": \"2024-06-24\",\n", + " \"subject\": \"A Warm Welcome to Co1t!\",\n", + " \"text\": \"We are delighted to welcome you to the team! As you embark on your journey with us, you'll find attached an agenda to guide you through your first week.\",\n", + " },\n", + " {\n", + " \"from\": \"it@co1t.com\",\n", + " \"to\": \"david@co1t.com\",\n", + " \"date\": \"2024-06-24\",\n", + " \"subject\": \"Setting Up Your IT Needs\",\n", + " \"text\": \"Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.\",\n", + " },\n", + " {\n", + " \"from\": \"john@co1t.com\",\n", + " \"to\": \"david@co1t.com\",\n", + " \"date\": \"2024-06-24\",\n", + " \"subject\": \"First Week Check-In\",\n", + " \"text\": \"Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!\",\n", + " },\n", + "]\n", + "\n", + "yaml_emails = [yaml.dump(doc, sort_keys=False) for doc in emails]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Rank: 1\n", + "Score: 0.13477592\n", + "Document: {'from': 'john@co1t.com', 'to': 'david@co1t.com', 'date': '2024-06-24', 'subject': 'First Week Check-In', 'text': \"Hello! I hope you're settling in well. Let's connect briefly tomorrow to discuss how your first week has been going. Also, make sure to join us for a welcoming lunch this Thursday at noon—it's a great opportunity to get to know your colleagues!\"}\n", + "\n", + "Rank: 2\n", + "Score: 0.0010083435\n", + "Document: {'from': 'it@co1t.com', 'to': 'david@co1t.com', 'date': '2024-06-24', 'subject': 'Setting Up Your IT Needs', 'text': 'Greetings! To ensure a seamless start, please refer to the attached comprehensive guide, which will assist you in setting up all your work accounts.'}\n", + "\n" + ] + } + ], + "source": [ + "# Add the user query\n", + "query = \"Any email about check ins?\"\n", + "\n", + "# Rerank the documents\n", + "results = co.rerank(\n", + " model=\"model\", # Pass a dummy string\n", + " query=query,\n", + " documents=yaml_emails,\n", + " top_n=2,\n", + ")\n", + "\n", + "return_results(results, emails)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we learned about:\n", + "- How to set up the Cohere client to use the Rerank model deployed on Azure AI Foundry\n", + "- How to retrieve documents\n", + "- How to rerank documents\n", + "- How to rerank semi structured data\n", + "\n", + "In the next tutorial, we'll learn how to build RAG applications by leveraging the models that we've looked at in the previous tutorials - Command, Embed, and Rerank." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "base" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/guides/cohere-on-azure/v2/azure-ai-sem-search.ipynb b/notebooks/guides/cohere-on-azure/v2/azure-ai-sem-search.ipynb new file mode 100644 index 000000000..ab68ef2cd --- /dev/null +++ b/notebooks/guides/cohere-on-azure/v2/azure-ai-sem-search.ipynb @@ -0,0 +1,451 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Semantic Search" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we'll explore semantic search using Cohere's Embed model. Semantic search enables search systems to capture the meaning and context of search queries, going beyond simple keyword matching to find relevant results based on semantic similarity. \n", + "\n", + "With the Embed model, you can do this across languages. This is particularly powerful for multilingual applications where the same meaning can be expressed in different languages.\n", + "\n", + "In this tutorial, we'll cover:\n", + "- Setting up the Cohere client\n", + "- Embedding text data\n", + "- Building a search index\n", + "- Performing semantic search queries\n", + "\n", + "We'll use Cohere's Embed model deployed on Azure to demonstrate these capabilities and help you understand how to effectively implement semantic search in your applications.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, you will need to deploy the Embed model on Azure via Azure AI Foundry. The deployment will create a serverless API with pay-as-you-go token based billing. You can find more information on how to deploy models in the [Azure documentation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio).\n", + "\n", + "In the example below, we are deploying the Embed Multilingual v3 model.\n", + "\n", + "Once the model is deployed, you can access it via Cohere's Python SDK. Let's now install the Cohere SDK and set up our client.\n", + "\n", + "To create a client, you need to provide the API key and the model's base URL for the Azure endpoint. You can get these information from the Azure AI Foundry platform where you deployed the model." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# %pip install cohere hnswlib\n", + "\n", + "import pandas as pd\n", + "import hnswlib\n", + "import re\n", + "import cohere\n", + "\n", + "co = cohere.ClientV2(\n", + " api_key=\"AZURE_API_KEY_EMBED\",\n", + " base_url=\"AZURE_ENDPOINT_EMBED\" # example: \"https://cohere-embed-v3-multilingual-xyz.eastus.models.ai.azure.com/\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this example, we'll be using [MultiFIN](https://aclanthology.org/2023.findings-eacl.66.pdf) - an open-source dataset of financial article headlines in 15 different languages (including English, Turkish, Danish, Spanish, Polish, Greek, Finnish, Hebrew, Japanese, Hungarian, Norwegian, Russian, Italian, Icelandic, and Swedish).\n", + "\n", + "We've prepared a CSV version of the MultiFIN dataset that includes an additional column containing English translations. While we won't use these translations for the model itself, they'll help us understand the results when we encounter headlines in Danish or Spanish. We'll load this CSV file into a pandas dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | text | \n", + "labels | \n", + "lang | \n", + "id | \n", + "translation | \n", + "
---|---|---|---|---|---|
0 | \n", + "Revenue Recognition | \n", + "['Accounting & Assurance'] | \n", + "English | \n", + "Israel-4145 | \n", + "Revenue Recognition | \n", + "
1 | \n", + "Más de la mitad de las empresas españolas fuer... | \n", + "['Financial Crime'] | \n", + "Spanish | \n", + "Spain-2044 | \n", + "More than half of the Spanish companies were v... | \n", + "
2 | \n", + "Wynagrodzenie netto w Polsce to średnio 71% pe... | \n", + "['Human Resource'] | \n", + "Polish | \n", + "Poland-1567 | \n", + "The net salary in Poland is an average of 71% ... | \n", + "
3 | \n", + "Time to talk: What has to change for women at ... | \n", + "['Human Resource'] | \n", + "English | \n", + "Turkey-5447 | \n", + "Time to talk: What has to change for women at ... | \n", + "
4 | \n", + "Total Retail 2017 | \n", + "['Retail & Consumers'] | \n", + "English | \n", + "Spain-1981 | \n", + "Total Retail 2017 | \n", + "