Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -420,3 +420,34 @@ You can configure SearXNG in LibreChat within the UI or through `librechat.yaml`
6. **The Web Search badge should now be enabled, meaning your queries can now utilize the web search functionality**
![Web search badge confirmation](/images/web-search/search_badge_confirm.png)

## Setting up Crawl4AI

Crawl4AI delivers blazing-fast, AI-ready web crawling tailored for large language models, AI agents, and data pipelines. Fully open source, flexible, and built for real-time performance, Crawl4AI empowers developers with unmatched speed, precision, and deployment ease.

Here are the steps to set up your own Crawl4AI instance for use with LibreChat:

### Using Docker Desktop

1. **Search for the official Crawl4AI image**
- Open Docker Desktop
- Search for `unclecode/crawl4ai` in the **Images** tab
- Click **Run** on the official image to automatically pull down and run a container of the image

2. **Running the container**
- Expand the **Optional Settings** dropdown in the proceeding panel that appears once the download completes
- Set your desired configuration details (port number, container name, etc.)
- Click **Run** to start the container

### Configuring LibreChat to use Crawl4AI

You can configure LibreChat to use Crawl4AI within the UI or through `librechat.yaml` as so:

```
webSearch:
scraperProvider: "crawl4ai"
crawl4aiApiUrl: "${CRAWL4AI_API_URL}"
crawl4aiOptions:
fitStrategy: "fit" # or "raw"
```

Currently, the only option exposed for `crawl4ai` is `fitStrategy` which controls if raw markdown is returned, or filtered markdown using crawl4ai's internal pruner.
13 changes: 13 additions & 0 deletions pages/docs/features/web_search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ To get started with web search, you'll need to configure API keys for all three
FIRECRAWL_API_URL=your_firecrawl_api_url
# Optional: Firecrawl API version (v0 or v1)
# FIRECRAWL_VERSION=v1
# or
# CRAWL4AI_API_URL=your_crawl4ai_instance_url
# CRAWL4AI_API_KEY=your_crawl4ai_api_key # optional

# Reranker (Required)
JINA_API_KEY=your_jina_api_key
Expand Down Expand Up @@ -60,6 +63,11 @@ Each component of the web search feature requires its own API key. Here's how to
5. Set it in your environment variables or provide it through the UI
6. (Optional) If you're using a custom Firecrawl instance, you'll also need to set the API URL

### Scraper: Crawl4AI
1. Follow the setup instructions in the [Web Search Configuration](/docs/configuration/librechat_yaml/object_structure/web_search#setting-up-crawl4ai) documentation
2. Set `CRAWL4AI_API_URL` to your instance URL
3. Optionally set `CRAWL4AI_API_KEY` if your instance requires authentication

### Rerankers

#### Jina
Expand Down Expand Up @@ -97,6 +105,8 @@ Scrapers extract the actual content from web pages returned by the search provid
- **Firecrawl**: A powerful web scraping service that extracts content from web pages
- Get your API key from [Firecrawl.dev](https://docs.firecrawl.dev/introduction#api-key)
- API URL is optional (defaults to Firecrawl's hosted service)
- ** Crawl4AI**: Crawl4AI turns the web into clean, LLM ready Markdown for RAG, agents, and data pipelines. Fast, controllable, battle tested by a 50k+ star community.
- Self-host your own instance

**Planned Scrapers:**
- **Local Firecrawl**: Self-hosted version of Firecrawl
Expand Down Expand Up @@ -139,6 +149,9 @@ webSearch:
# Scraper Configuration
firecrawlApiKey: "${CUSTOM_FIRECRAWL_API_KEY}"
firecrawlApiUrl: "${CUSTOM_FIRECRAWL_API_URL}"
# or
crawl4aiApiUrl: "${CUSTOM_CRAWL4AI_API_URL}"
crawl4aiApiKey: "${CUSTOM_CRAWL4AI_API_KEY}"
# firecrawlApiKey: "fc-123..." # ❌ Wrong: Never put actual API keys here
# firecrawlApiUrl: "https://..." # ❌ Wrong: Never put actual URLs here

Expand Down