Skip to content

faizancodes/llm-eval-platform

Repository files navigation

LLM Evaluation Platform

Main features:

  • Set up experiments with a unique system prompt and multiple LLM models
  • Evaluate the responses from the LLM models with certain metrics such as exact match, LLM judge, cosine similarity, etc. (more to come!)
  • Streaming responses from the LLM models to the frontend
  • Upload a json file with test cases to evaluate the overall performance of the LLM models and see which one is the best
  • Compare the response times, time to first token, tokens per second for each model using my own NPM library llm-chain!
  • Visualize the results with graphs

Screenshots

Screenshot 2025-01-09 at 7 31 51 PM

Screenshot 2025-01-09 at 7 24 29 PM

Screenshot 2025-01-09 at 7 23 45 PM

Screenshot 2025-01-09 at 7 38 44 PM

Running Locally

  1. First, clone the repository and install the dependencies:
git clone https://github.com/faizancodes/llm-eval-platform.git
cd llm-eval-platform
npm install
  1. Set up the environment variables in the .env file.
  • Groq and Google are both free to use. OpenAI does require a paid account, so you would like to use this app without it, you can remove the OPENAI_API_KEY from the env.ts file and make other modifications to the codebase as necessary.
  • The database URL is provided by Neon. You can sign up for a free account here.
OPENAI_API_KEY=""
GROQ_API_KEY=""
GOOGLE_API_KEY=""
DATABASE_URL=""
NODE_ENV="development"
NEXT_PUBLIC_APP_URL="http://localhost:3000"
  1. Then, run the development server:
npm run dev
  1. Open http://localhost:3000 with your browser to see the result.

About

A platform to easily find the best LLM model for your use case

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages