Update genai_cookbook site content with new aget sample app

Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
databricks · Oct 1, 2024 · 7bf4332 · 7bf4332
1 parent 2ed98a5
commit 7bf4332
Show file tree

Hide file tree

Showing 15 changed files with 44 additions and 53 deletions.
diff --git a/README.md b/README.md
@@ -2,14 +2,14 @@
 
 Please visit http://ai-cookbook.io for the accompanying documentation for this repo.
 
-This repo provides [learning materials](https://ai-cookbook.io/) and [production-ready code](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code) to build a **high-quality RAG application** using Databricks. The [Mosaic Generative AI Cookbook](https://ai-cookbook.io/) provides:
+This repo provides [learning materials](https://ai-cookbook.io/) and [production-ready code](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code) to build a **high-quality RAG application** using Databricks. The [Mosaic Generative AI Cookbook](https://ai-cookbook.io/) provides:
   - A conceptual overview and deep dive into various Generative AI design patterns, such as Prompt Engineering, Agents, RAG, and Fine Tuning
   - An overview of Evaluation-Driven development
   - The theory of every parameter/knob that impacts quality
   - How to root cause quality issues and detemermine which knobs are relevant to experiment with for your use case
   - Best practices for how to experiment with each knob
 
-The [provided code](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code) is intended for use with the Databricks platform.  Specifically:
+The [provided code](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code) is intended for use with the Databricks platform.  Specifically:
 - [Mosaic AI Agent Framework](https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html) which provides a fast developer workflow with enterprise-ready LLMops & governance
 - [Mosaic AI Agent Evaluation](https://docs.databricks.com/en/generative-ai/agent-evaluation/index.html) which provides reliable, quality measurement using proprietary AI-assisted LLM judges to measure quality metrics that are powered by human feedback collected through an intuitive web-based chat UI
 

diff --git a/genai_cookbook/10-min-demo/Mosaic-AI-Agents-10-Minute-Demo.ipynb b/genai_cookbook/10-min-demo/Mosaic-AI-Agents-10-Minute-Demo.ipynb
@@ -677,7 +677,7 @@
     "\n",
     "## Browse the code samples\n",
     "\n",
-    "Open the `./genai-cookbook/rag_app_sample_code` folder that was synced to your Workspace by this notebook.  Documentation [here](https://ai-cookbook.io/nbs/6-implement-overview.html).\n",
+    "Open the `./genai-cookbook/agent_app_sample_code` folder that was synced to your Workspace by this notebook.  Documentation [here](https://ai-cookbook.io/nbs/6-implement-overview.html).\n",
     "\n",
     "## Read the [Generative AI Cookbook](https://ai-cookbook.io)!\n",
     "\n",
@@ -706,6 +706,9 @@
    },
    "notebookName": "Mosaic-AI-Agents-10-Minute-Demo",
    "widgets": {}
+  },
+  "language_info": {
+   "name": "python"
   }
  },
  "nbformat": 4,

diff --git a/genai_cookbook/_config.yml b/genai_cookbook/_config.yml
@@ -12,7 +12,7 @@ execute:
 
 # Information about where the book exists on the web
 repository:
-  url: https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code
+  url: https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code
   path_to_book: ../genai_cookbook  # Optional path to your book, relative to the repository root
   branch: main  # Which branch of the repository should be used when creating links (optional)
 

diff --git a/genai_cookbook/nbs/5-hands-on-build-poc.md b/genai_cookbook/nbs/5-hands-on-build-poc.md
@@ -11,9 +11,9 @@
 1. Completed [start here](./6-implement-overview.md) steps
 2. Data from your [requirements](/nbs/5-hands-on-requirements.md#requirements-questions) is available in your [Lakehouse](https://www.databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html) inside a Unity Catalog [volume](https://docs.databricks.com/en/connect/unity-catalog/volumes.html) <!-- or [Delta Table](https://docs.databricks.com/en/delta/index.html)-->
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 **Expected outcome**
@@ -62,49 +62,35 @@ By default, the POC uses the open source models available on [Mosaic AI Foundati
 
 
 
-1. **Open the POC code folder within [`A_POC_app`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/A_POC_app) based on your type of data:**
+1. **Open the POC code folder within [`agent_app_sample_code`](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code) for data in pdf, docx, or html.**
 
-   <br/>
-
-   | File type                  | Source                 | POC application folder |
-   |----------------------------|------------------------|------------------------|
-   | PDF files                  | UC Volume              |   [`pdf_uc_volume`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/A_POC_app/pdf_uc_volume)                     |
-   | Powerpoint files           | UC Volume              |        [`pptx_uc_volume`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/A_POC_app/pptx_uc_volume)                |
-   | DOCX files                 | UC Volume              |        [`docx_uc_volume`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/A_POC_app/docx_uc_volume)                |
-   | JSON files w/ text/markdown/HTML content & metadata | UC Volume  |              [`json_uc_volume`](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code/A_POC_app/html_uc_volume)          |  
-   <!--| HTML content               | Delta Table            |                        |
-   | Markdown or regular text   | Delta Table            |                        | -->
-
-   If your data doesn't meet one of the above requirements, you can customize the parsing function (`parser_udf`) within `02_poc_data_pipeline` in the above POC directories to work with your file types.
+   If your data doesn't meet one of the above requirements, you can customize the parsing function (`file_parser`) within `02_data_pipeline` in the above directory to work with your file types.
 
    Inside the POC folder, you will see the following notebooks:
 
+<!-- TODO (prithvi): update this -->
 ```{image} ../images/5-hands-on/6_img.png
 :align: center
 ```
 
 ```{tip}
-The notebooks referenced below are relative to the specific POC you've chosen. For example, if you see a reference to `00_config` and you've chosen `pdf_uc_volume`, you'll find the relevant `00_config` notebook at [`A_POC_app/pdf_uc_volume/00_config`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/A_POC_app/pdf_uc_volume/00_config.py).
+The notebooks referenced below are relative to the specific POC you've chosen. For example, if you see a reference to `00_config` and you've chosen `pdf_uc_volume`, you'll find the relevant `00_global_config` notebook at [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/main/agent_app_sample_code/00_global_config.py).
 ```
 
 <br/>
 
 2. **Optionally, review the default parameters**
 
-   Open the `00_config` Notebook within the POC directory you chose above to view the POC's applications default parameters for the data pipeline and RAG chain.
+   Open the `00_global_config` Notebook within the directory to view the POC's applications default parameters for the data pipeline and RAG chain.
 
 
    ```{note}
    **Important:** our recommended default parameters are by no means perfect, nor are they intended to be. Rather, they are a place to start from - the next steps of our workflow guide you through iterating on these parameters.
    ```
 
-3. **Validate the configuration**
-
-   Run the `01_validate_config` to check that your configuration is valid and all resources are available. You will see an `rag_chain_config.yaml` file appear in your directory - we will use this in step 4 to deploy the application.
-
-4. **Run the data pipeline**
+3. **Run the data pipeline**
 
-   The POC data pipeline is a Databricks Notebook based on Apache Spark. Open the `02_poc_data_pipeline` Notebook and press Run All to execute the pipeline. The pipeline will:
+   The POC data pipeline is a Databricks Notebook based on Apache Spark. Open the `02_data_pipeline` Notebook and press Run All to execute the pipeline. The pipeline will:
 
    1. Load the raw documents from the UC Volume
    2. Parse each document, saving the results to a Delta Table
@@ -142,7 +128,7 @@ The notebooks referenced below are relative to the specific POC you've chosen. F
    The POC Chain uses MLflow code-based logging. To understand more about code-based logging, visit the [docs](https://docs.databricks.com/generative-ai/create-log-agent.html#code-based-vs-serialization-based-logging).
    ```
 
-   1. Open the `03_deploy_poc_to_review_app` Notebook
+   1. Open the `03_agent_proof_of_concept` Notebook
 
    2. Run each cell of the Notebook.
 
@@ -155,7 +141,7 @@ The notebooks referenced below are relative to the specific POC you've chosen. F
    4. Modify the default instructions to be relevant to your use case.  These are displayed in the Review App.
 
       ```python
-         instructions_to_reviewer = f"""## Instructions for Testing the {RAG_APP_NAME}'s Initial Proof of Concept (PoC)
+         instructions_to_reviewer = f"""## Instructions for Testing the {AGENT_NAME}'s Initial Proof of Concept (PoC)
 
          Your inputs are invaluable for the development team. By providing detailed feedback and corrections, you help us fix issues and improve the overall quality of the application. We rely on your expertise to identify any gaps or areas needing enhancement.
 
@@ -170,7 +156,7 @@ The notebooks referenced below are relative to the specific POC you've chosen. F
             - Carefully review each document that the system returns in response to your question.
             - Use the thumbs up/down feature to indicate whether the document was relevant to the question asked. A thumbs up signifies relevance, while a thumbs down indicates the document was not useful.
 
-         Thank you for your time and effort in testing {RAG_APP_NAME}. Your contributions are essential to delivering a high-quality product to our end users."""
+         Thank you for your time and effort in testing {AGENT_NAME}. Your contributions are essential to delivering a high-quality product to our end users."""
 
          print(instructions_to_reviewer)
       ```

diff --git a/genai_cookbook/nbs/5-hands-on-curate-eval-set.md b/genai_cookbook/nbs/5-hands-on-curate-eval-set.md
@@ -10,9 +10,9 @@
 
 *Time varies based on the quality of the responses provided by your stakeholders.  If the responses are messy or contain lots of irrelevant queries, you will need to spend more time filtering and cleaning the data.*
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 #### **Overview & expected outcome**
@@ -49,6 +49,6 @@ Databricks recommends that your Evaluation Set contain at least 30 questions to
 
 2. Inspect the Evaluation Set to understand the data that is included. You need to validate that your Evaluation Set contains a representative and challenging set of questions. Adjust the Evaluation Set as required.
 
-3. By default, your evaluation set is saved to the Delta Table configured in `EVALUATION_SET_FQN` in the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/main/rag_app_sample_code/00_global_config.py) Notebook.
+3. By default, your evaluation set is saved to the Delta Table configured in `EVALUATION_SET_FQN` in the [`00_global_config`](https://github.com/databricks/genai-cookbook/blob/main/agent_app_sample_code/00_global_config.py) Notebook.
 
 > **Next step:** Now that you have an evaluation set, use it to [evaluate the POC app's](./5-hands-on-evaluate-poc.md) quality/cost/latency.
diff --git a/genai_cookbook/nbs/5-hands-on-evaluate-poc.md b/genai_cookbook/nbs/5-hands-on-evaluate-poc.md
@@ -10,9 +10,9 @@
 
 *Time varies based on the number of questions in your evaluation set.  For 100 questions, evaluation will take approximately 5 minutes.*
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 ### **Overview & expected outcome**

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-generation.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-generation.md
@@ -1,3 +1,4 @@
+<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 #### Debugging generation quality
 
 ##### Debugging generation quality

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-retrieval.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1-retrieval.md
@@ -1,3 +1,4 @@
+<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 #### Debugging retrieval quality
 
 ##### How to debug retrieval quality

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-1.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-1.md
@@ -1,3 +1,4 @@
+<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 ### **Step 5:** Identify the root cause of quality issues
 
 ```{image} ../images/5-hands-on/workflow_iterate.png
@@ -13,9 +14,9 @@
     - If you followed the previous step, this will be the case!
 2. All requirements from previous steps
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 #### **Overview**

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-2-data-pipeline.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-2-data-pipeline.md
@@ -1,3 +1,4 @@
+<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 # **![Data pipeline](../images/5-hands-on/data_pipeline.png)** Implement data pipeline fixes
 
 Follow these steps to modify your data pipeline and run it to:

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality-step-2.md b/genai_cookbook/nbs/5-hands-on-improve-quality-step-2.md
@@ -1,3 +1,4 @@
+<!-- TODO (prithvi): move this into the 5-hands-on-evaluate-poc -->
 ### **Step 6:** Iteratively implement & evaluate quality fixes
 
 ```{image} ../images/5-hands-on/workflow_iterate.png
@@ -10,9 +11,9 @@
 1. Based on your [root cause analysis](./5-hands-on-improve-quality-step-1.md), you have identified a potential fixes to either [retrieval](./5-hands-on-improve-quality-step-1-retrieval.md) or [generation](./5-hands-on-improve-quality-step-1-generation.md) to implement and evaluate
 2. Your POC application (or another baseline chain) is logged to an MLflow Run with an Agent Evaluation evaluation stored in the same Run
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 #### Expected outcome

diff --git a/genai_cookbook/nbs/5-hands-on-improve-quality.md b/genai_cookbook/nbs/5-hands-on-improve-quality.md
@@ -12,9 +12,9 @@ While a basic RAG chain is relatively straightforward to implement, refining it
 
 Simply vectorizing a set of documents, retrieving them via semantic search, and passing the retrieved documents to an LLM is not sufficient to guarantee optimal results. To yield high-quality outputs, you need to consider factors such as (but not limited to) chunking strategy of documents, choice of LLM and model parameters, or whether to include a query understanding step. As a result, ensuring high quality RAG outputs will generally involve iterating over both the data pipeline (e.g., chunking) and the RAG chain itself (e.g., choice of LLM).
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 This step is divided into 2 sub-steps:

diff --git a/genai_cookbook/nbs/5-hands-on-requirements.md b/genai_cookbook/nbs/5-hands-on-requirements.md
@@ -6,9 +6,9 @@
 <br/>
 Defining clear and comprehensive use case requirements is a critical first step in developing a successful RAG application. These requirements serve two primary purposes. Firstly, they help determine whether RAG is the most suitable approach for the given use case. If RAG is indeed a good fit, these requirements guide solution design, implementation, and evaluation decisions. Investing time at the outset of a project to gather detailed requirements can prevent significant challenges and setbacks later in the development process, and ensures that the resulting solution meets the needs of end-users and stakeholders. Well-defined requirements provide the foundation for the subsequent stages of the development lifecycle we'll walk through.
 
-```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code)
+```{admonition} [Code Repository](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code)
 :class: tip
-You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/rag_app_sample_code).
+You can find all of the sample code referenced throughout this section [here](https://github.com/databricks/genai-cookbook/tree/main/agent_app_sample_code).
 ```
 
 ### Is the use case a good fit for RAG?