🦜️🌐 WebLangChain

This repo is an example of performing retrieval using the entire internet as a document store.

Try it live: weblangchain.vercel.app

✅ Running locally

By default, WebLangChain uses Tavily to fetch content from webpages. You can get an API key from by signing up. If you'd like to swap in a different base retriever (e.g. if you want to use your own data source), you can modify the get_base_retriever() method in main.py.

Install backend dependencies: poetry install.
Make sure to set your environment variables to configure the application:

export OPENAI_API_KEY=
export TAVILY_API_KEY=

# for tracing
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY=
export LANGCHAIN_PROJECT=

Start the Python backend with poetry run make start.
Install frontend dependencies by running cd nextjs, then yarn.
Run the frontend with yarn dev for frontend.
Open localhost:3000 in your browser.

⚙️ How it works

The general retrieval flow looks like this:

Pull in raw content related to the user's initial query using a retriever that wraps Tavily's Search API.
- For subsequent conversation turns, we also rephrase the original query into a "standalone query" free of references to previous chat history.
Because the size of the raw documents usually exceed the maximum context window size of the model, we perform additional contextual compression steps to filter what we pass to the model.
- First, we split retrieved documents using a text splitter.
- Then we use an embeddings filter to remove any chunks that do not meet a similarity threshold with the initial query.
The retrieved context, the chat history, and the original question are passed to the LLM as context for the final generation.

Here's a LangSmith trace illustrating the above:

https://smith.langchain.com/public/f4493d9c-218b-404a-a890-31c15c56fff3/r

It's built using:

Tavily as a retriever
LangChain for orchestration
LangServe to directly expose LangChain runnables as endpoints
FastAPI
Next.js for the frontend

🚀 Deployment

The live version is hosted on Fly.dev and Vercel. The backend Python logic is found in main.py, and the frontend Next.js app is under nextjs/.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
nextjs		nextjs
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
fly.toml		fly.toml
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦜️🌐 WebLangChain

✅ Running locally

⚙️ How it works

🚀 Deployment

About

Releases

Packages

Languages

License

marvins56/weblangchain

Folders and files

Latest commit

History

Repository files navigation

🦜️🌐 WebLangChain

✅ Running locally

⚙️ How it works

🚀 Deployment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages