Skip to content

Files

Latest commit

632d347 · Jan 21, 2025

History

History

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
Jul 23, 2024
Jan 21, 2025
Jun 12, 2024
Jan 21, 2025
Jan 21, 2025
Jan 21, 2025
May 25, 2023
Sep 23, 2024

Awesome WebLLM

This page contains a curated list of examples, tutorials, blogs about WebLLM usecases. Please send a pull request if you find things that belongs to here.

Example Projects

Note that all examples below run in-browser and use WebGPU as a backend.

Project List

  • get-started: minimum get started example with chat completion.

    Open in JSFiddle Open in Codepen

  • simple-chat-js: a mininum and complete chat bot app in vanilla JavaScript.

    Open in JSFiddle Open in Codepen

  • simple-chat-ts: a mininum and complete chat bot app in TypeScript.

  • get-started-web-worker: same as get-started, but using web worker.

  • next-simple-chat: a mininum and complete chat bot app with Next.js.

  • multi-round-chat: while APIs are functional, we internally optimize so that multi round chat usage can reuse KV cache

  • text-completion: demonstrates API engine.completions.create(), which is pure text completion with no conversation, as opposed to engine.chat.completions.create()

  • embeddings: demonstrates API engine.embeddings.create(), integration with EmbeddingsInterface and MemoryVectorStore of Langchain.js, and RAG with Langchain.js using WebLLM for both LLM and Embedding in a single engine

  • multi-models: demonstrates loading multiple models in a single engine concurrently

Advanced OpenAI API Capabilities

These examples demonstrate various capabilities via WebLLM's OpenAI-like API.

  • streaming: return output as chunks in real-time in the form of an AsyncGenerator
  • json-mode: efficiently ensure output is in json format, see OpenAI Reference for more.
  • json-schema: besides guaranteeing output to be in JSON, ensure output to adhere to a specific JSON schema specified the user
  • seed-to-reproduce: use seeding to ensure reproducible output with fields seed.
  • function-calling (WIP): function calling with fields tools and tool_choice (with preliminary support).
  • vision-model: process request with image input using Vision Language Model (e.g. Phi3.5-vision)

Chrome Extension

Others

  • logit-processor: while logit_bias is supported, we additionally support stateful logit processing where users can specify their own rules. We also expose low-level API forwardTokensAndSample().
  • cache-usage: demonstrates how WebLLM supports both the Cache API and IndexedDB cache, and users can pick with appConfig.useIndexedDBCache. Also demonstrates various cache utils such as checking whether a model is cached, deleting a model's weights from cache, deleting a model library wasm from cache, etc.
  • simple-chat-upload: demonstrates how to upload local models to WebLLM instead of downloading via a URL link

Demo Spaces

  • web-llm-embed: document chat prototype using react-llm with transformers.js embeddings
  • DeVinci: AI chat app based on WebLLM and hosted on decentralized cloud platform