minor: use case docs (#279)

* minor: use case docs * concepts & update links * run locally * cr * rename frontend * move into backend * more updates * modify * run locally * deployment * fix env example file * cr * nit and pushup proposal for modify structure * improved modification structure * add concepts guide for record manager * embeddings section * cr * cr
langchain-ai · Mar 7, 2024 · 9f951bd · 9f951bd
1 parent bd4376a
commit 9f951bd
Show file tree

Hide file tree

Showing 49 changed files with 718 additions and 60 deletions.
diff --git a/.dockerignore b/.dockerignore
@@ -1,2 +1,2 @@
-chat-langchain/
+frontend/
 assets/
diff --git a/.env.gcp.yaml.example b/.env.gcp.yaml.example
@@ -0,0 +1,9 @@
+OPENAI_API_KEY: your_secret_key_here
+LANGCHAIN_TRACING_V2: "true"
+LANGCHAIN_PROJECT: langserve-launch-example
+LANGCHAIN_API_KEY: your_secret_key_here
+FIREWORKS_API_KEY: your_secret_here
+WEAVIATE_API_KEY: your_secret_key_here
+WEAVIATE_URL: https://your-weaviate-instance.com
+WEAVIATE_INDEX_NAME: your_index_name
+RECORD_MANAGER_DB_URL: your_db_url
diff --git a/.github/workflows/clear-and-update-index.yml b/.github/workflows/clear-and-update-index.yml
@@ -29,7 +29,7 @@ jobs:
           RECORD_MANAGER_DB_URL: ${{ secrets.RECORD_MANAGER_DB_URL }}
 
       - name: Ingest docs
-        run: poetry run python ingest.py
+        run: poetry run python backend/ingest.py
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
           WEAVIATE_URL: ${{ secrets.WEAVIATE_URL }}

diff --git a/.github/workflows/update-index.yml b/.github/workflows/update-index.yml
@@ -25,7 +25,7 @@ jobs:
       - name: Install dependencies
         run: poetry install
       - name: Ingest docs
-        run: poetry run python ingest.py
+        run: poetry run python backend/ingest.py
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
           WEAVIATE_URL: ${{ secrets.WEAVIATE_URL }}

diff --git a/CONCEPTS.md b/CONCEPTS.md
@@ -0,0 +1,77 @@
+# Concepts
+
+In this doc we'll go over the different concepts that are implemented in Chat LangChain.
+By the end, you'll have a conceptual understanding of how Chat LangChain works, and it's different architectural components.
+We'll start with the vector store, the basis of the entire system.
+
+## Vector Store
+
+Vector stores, fundamentally, are specialized databases designed to efficiently store and manage vectors, which are high-dimensional arrays of numbers. These vectors are not arbitrary; they are the product of sophisticated text embedding models, such as those provided by [OpenAI's `text-embedding`](https://python.langchain.com/docs/integrations/text_embedding) API.
+
+In the context of our application, vector stores play a pivotal role in enhancing the capabilities of our language model. Here's a deeper dive into the process:
+
+1. **Vector Generation**: Whenever new content related to LangChain is introduced or existing content is updated, we use text embedding models to convert this textual information into vectors. Each vector acts as a unique fingerprint of its corresponding text, encapsulating its meaning in a high-dimensional space.
+
+2. **Similarity Searches**: The core utility of storing these vectors comes into play when we need to find information relevant to a user's query. By converting the user's question into a vector using the same embedding model, we can perform a similarity search across our vector store. This search identifies vectors (and thus, documents) whose meanings are closest to the query, based on the distance between vectors in the embedding space.
+
+3. **Context Retrieval and Enhancement**: The documents retrieved through similarity searches are relevant pieces of information that aid the language model in generating relevant answers. By providing this context, we enable the language model to generate responses that are not only accurate but also informed by the most relevant and up-to-date information available in our database.
+
+## Indexing
+
+Indexing your documents is a vital part of any production RAG application. In short, indexing allows for your documents to be stored, and searchable to prevent duplicate documents from being stored. This is important for a few reasons:
+
+1. **Duplicate Results**: Say you update your vector store without using an Indexing API. Now you may have two identical documents in your store. Then, when you perform a semantic search, instead of getting K number of different results, you'll get duplicates as the semantic search only returns documents which are semantically close to the query.
+
+2. **Performance**: Indexing your documents allows for faster ingestion. With indexing you don't have to generate embeddings for every document on ingestion, and instead only need to generate embeddings for new documents.
+
+In order to help with indexing we use the LangChain indexing API. This API contains all the features required for robust indexing in your application. Indexing is done in two main steps:
+
+1. **Ingestion**: Ingestion is where you pull in all the documents you want to add to your vector store. This could be all of the documents available to you, or just a couple new documents.
+2. **Hashing**: Once the ingestion API is passed your documents, it creates a unique hash for each, containing some metadata like the date it was ingested. This allows for the indexing API to only ingest new documents, and not duplicate documents. These hashes are stored in what we call the "Record Manager".
+3. **Insertion**: Finally, once the documents are hashed, and confirmed to not already exist through the Record Manager, they are inserted into the vector store.
+
+The indexing API also uses a Record Manager to store the records of previously indexed documents in between ingestion. This manager stores the hashed values of the documents, and the time they were ingested. This allows for the indexing API to only ingest new documents, and not duplicate documents.
+
+### Record Manager
+
+The LangChain Record Manager API provides an interface for managing records in a database that tracks upserted documents before they are ingested into a vector store for LLM usage.
+It allows you to efficiently insert, update, delete, and query records.
+
+**Key Concepts**
+- **Namespace**: Each Record Manager is associated with a namespace. This allows logically separating records for different use cases.
+- **Keys**: Each record is uniquely identified by a key within the namespace.
+- **Group IDs**: Records can optionally be associated with group IDs to allow filtering and batch operations on related records.
+- **Timestamps**: Each record has an updated_at timestamp tracking the last time it was upserted. This enables querying records within time ranges.
+
+Using the LangChain Record Manager API allows efficient tracking of which documents need to be added to or updated in the vector store, making the ingestion process more robust and avoiding unnecessary duplication work.
+
+## Query Analysis
+
+Finally, we perform query analysis on followup chat conversations. It is important to note that we only do this for followups, and not initial questions. Let's break down the reasoning here:
+
+Users are not always the best prompters, and can very easily miss some context or phrase their question poorly. We can be confident that the LLM will not make this mistake.
+Additionally, given a chat history (which is always passed in context to a model) you may not need to include certain parts of the question, or the reverse, where you do need to clarify additional information.
+
+Doing all this helps make better formed questions for the model, without having to rely on the user to do so.
+
+Lastly, we don't perform this on the initial question for two main reasons:
+
+1. **Speed**: Although models are getting faster and faster, they still take longer than we'd like to return a response. This is even more important for the first question, as the chat bot hasn't proved its usefulness to the user yet, and you don't want to lose them due to speed before they've even started.
+2. **Context**: Without a chat history, the model is lacking some important context around the users question.
+
+Most users won't format their queries perfectly for LLMs, and that's okay!
+To account for this, we have an extra step before final generation which takes the users query and rephrase it to be more suitable for the LLM.
+
+The prompt is quite simple:
+```python
+REPHRASE_TEMPLATE = """\
+Given the following conversation and a follow up question, rephrase the follow up \
+question to be a standalone question.
+
+Chat History:
+{chat_history}
+Follow Up Input: {question}
+Standalone Question:"""
+```
+
+In doing this, the language model is able to take the users question, and the full chat history which contains other questions, answers and context, and generate a more well formed response. Now using this rephrased question, we can perform a similarity search on the vector store using this question, and often times get back better results as the question is semantically more similar to the previous questions/answers (content the database).
diff --git a/DEPLOYMENT.md b/DEPLOYMENT.md
@@ -0,0 +1,91 @@
+# Deployment
+
+We recommend when deploying Chat LangChain, you use Vercel for the frontend, GCP Cloud Run for the backend API, and GitHub action for the recurring ingestion tasks. This setup provides a simple and effective way to deploy and manage your application.
+
+## Prerequisites
+
+First, fork [chat-langchain](https://github.com/langchain-ai/chat-langchain) to your GitHub account.
+
+## Weaviate (Vector Store)
+
+We'll use Weaviate for our vector store. You can sign up for an account [here](https://console.weaviate.cloud/).
+
+After creating an account click "Create Cluster". Follow the steps to create a new cluster. Once finished wait for the cluster to create, this may take a few minutes.
+
+Once your cluster has been created you should see a few sections on the page. The first is the cluster URL. Save this as your `WEAVIATE_URL` environment variable.
+
+Next, click "API Keys" and save the API key in the environment variable `WEAVIATE_API_KEY`.
+
+The final Weaviate environment variable is "WEAVIATE_INDEX_NAME". This is the name of the index you want to use. You can name it whatever you want, but for this example, we'll use "langchain".
+
+After this your vector store will be setup. We can now move onto the record manager.
+
+## Supabase (Record Manager)
+
+Visit Supabase to create an account [here](https://supabase.com/dashboard).
+
+Once you've created an account, click "New project" on the dashboard page.
+Follow the steps, saving the database password after creating it, we'll need this later.
+
+Once your project is setup (this also takes a few minutes), navigate to the "Settings" tab, then select "Database" under "Configuration".
+
+Here, you should see a "Connection string" section. Copy this string, and insert your database password you saved earlier. This is your `RECORD_MANAGER_DB_URL` environment variable.
+
+That's all you need to do for the record manager. The LangChain RecordManager API will handle creating tables for you.
+
+## Vercel (Frontend)
+
+Create a Vercel account for hosting [here](https://vercel.com/signup).
+
+Once you've created your Vercel account, navigate to [your dashboard](https://vercel.com/) and click the button "Add New..." in the top right.
+This will open a dropdown. From there select "Project".
+
+On the next screen, search for "chat-langchain" (if you did not modify the repo name when forking). Once shown, click "Import".
+
+Finally, click "Deploy" and your frontend will be deployed!
+
+## GitHub Action (Recurring Ingestion)
+
+Now, in order for your vector store to be updated with new data, you'll need to setup a recurring ingestion task (this will also populate the vector store for the first time).
+
+Go to your forked repository, and navigate to the "Settings" tab.
+
+Select "Environments" from the left-hand menu, and click "New environment". Enter the name "Indexing" and click "Configure environment".
+
+When configuring, click "Add secret" and add the following secrets:
+
+```
+OPENAI_API_KEY=
+RECORD_MANAGER_DB_URL=
+WEAVIATE_API_KEY=
+WEAVIATE_INDEX_NAME=langchain
+WEAVIATE_URL=
+```
+
+These should be the same secrets as were added to Vercel.
+
+Next, navigate to the "Actions" tab and confirm you understand your workflows, and enable them.
+
+Then, click on the "Update index" workflow, and click "Enable workflow". Finally, click on the "Run workflow" dropdown and click "Run workflow".
+
+Once this has finished you can visit your production URL from Vercel, and start using the app!
+
+## Backend API via Cloud Run
+
+First, build the frontend:
+
+```shell
+cd frontend
+yarn
+yarn build
+```
+
+Then, to deploy to Google Cloud Run use the following command:
+
+First create a `.env.gcp.yaml` file with the contents from [`.env.gcp.yaml.example`](.env.gcp.yaml.example) and fill in the values. Then run:
+
+```shell
+gcloud run deploy chat-langchain --source . --port 8000 --env-vars-file .env.gcp.yaml --allow-unauthenticated --region us-central1 --min-instances 1
+```
+
+Finally, go back to Vercel and add an environment variable `NEXT_PUBLIC_API_BASE_URL` to match your Cloud Run URL.
diff --git a/Dockerfile b/Dockerfile
@@ -8,8 +8,8 @@ COPY ./pyproject.toml ./poetry.lock* ./
 
 RUN poetry install --no-interaction --no-ansi --no-root --no-directory
 
-COPY ./*.py ./
+COPY ./backend/*.py ./backend/
 
 RUN poetry install  --no-interaction --no-ansi
 
-CMD exec uvicorn main:app --host 0.0.0.0 --port 8080
+CMD exec uvicorn backend.main:app --host 0.0.0.0 --port 8080
diff --git a/LANGSMITH.md b/LANGSMITH.md
@@ -0,0 +1,74 @@
+# LangSmith
+
+Observability and evaluations are pivotal to any LLM application looking to be productionized, and improve beyond initial deployment.
+For this, we use LangSmith, a tool that encapsulates all the necessary components to monitor and improve your LLM applications.
+In addition to these two development tools, LangSmith also offers a feature for managing feedback from users.
+Getting real user feedback can be invaluable for improving your LLM application, based on facts from actual users, and not just assumptions/theories.
+
+## Observability
+
+Observability is simple when using LangChain as your LLM framework. In its simplest form, all you need is to set two environment variables:
+
+```shell
+export LANGCHAIN_TRACING_V2=true
+export LANGCHAIN_API_KEY=...
+```
+
+LangSmith tracing is already setup in an optimized way for Chat LangChain, and only needs extra configuration if you're extending the application in a way that's not covered by the default tracing.
+
+You may see this further customization throughout the repo, mainly in the form of adding config names to runs:
+
+```python
+.with_config(
+    run_name="CondenseQuestion",
+)
+```
+
+You can call `.with_config` on any [LangChain Runnable](https://python.langchain.com/docs/expression_language/) and apply things like a `run_name` as seen above.
+
+When running queries through Chat LangChain, you can expect to see LangSmith traces like this show up on your project:
+
+![LangSmith Traces](./assets/images/langsmith_trace.png)
+
+For more detailed information on LangSmith traces, visit the [LangSmith documentation](https://docs.smith.langchain.com/tracing/).
+
+## Evaluations
+
+Evals are a great way to discover issues with your LLM app, areas where it does not perform well, and track regression. LangSmith has a whole suite of tools to aid you with this.
+
+For in depth walkthroughs and explanations of LangSmith evaluations, visit the [LangSmith documentation](https://docs.smith.langchain.com/evaluation). This doc will only go over setting up and running evals on Chat LangChain.
+
+### Datasets
+
+For Chat LangChain, the team at LangChain has already put together a dataset for evaluating the app.
+
+You can find the dataset [here](https://smith.langchain.com/public/452ccafc-18e1-4314-885b-edd735f17b9d/d).
+
+The first step is to download the LangSmith node SDK:
+
+```shell
+pip install langsmith
+```
+
+Then, you'll want to define some custom criteria to evaluate your dataset on. Some examples are:
+
+- **Semantic similarity**: How similar your generated response is to the ground truth (dataset answers).
+- **LLM as a judge**: Use an LLM to judge and assign a score to your generated response.
+
+Finally, configure your evaluation criteria and use the [`run_on_dataset`](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset) function to evaluate your dataset.
+
+Once completed, you'll be able to view the results of your evaluation in the LangSmith dashboard. Using these results, you can improve and tweak your LLM.
+
+## Feedback
+
+Gathering feedback from users is a great way to gather human curated data on what works, what doesn't and how you can improve your LLM application. LangSmith makes tracking and gathering feedback as easy as pie.
+
+Currently, Chat LangChain supports gathering a simple 👍 or 👎, which is then translated into a binary score, and saved to each run in LangSmith. This feedback is then stored in the --you guessed it-- feedback tab of the LangSmith trace:
+
+![LangSmith Feedback](./assets/images/langsmith_feedback.png)
+
+Then, inside LangSmith you can efficiently use this data to visualize and understand your user's feedback, as well as curate datasets by feedback for evaluations.
+
+### Go further
+
+In addition to binary scores for feedback, LangSmith also allows for assigning comments to feedback. This can allow for you to gather more detailed and complex feedback from users, further fueling your human curated dataset for improving your LLM application.