- Set up experiments with a unique system prompt and multiple LLM models
- Evaluate the responses from the LLM models with certain metrics such as exact match, LLM judge, cosine similarity, etc. (more to come!)
- Streaming responses from the LLM models to the frontend
- Upload a json file with test cases to evaluate the overall performance of the LLM models and see which one is the best
- Compare the response times, time to first token, tokens per second for each model using my own NPM library llm-chain!
- Visualize the results with graphs
- First, clone the repository and install the dependencies:
git clone https://github.com/faizancodes/llm-eval-platform.git
cd llm-eval-platform
npm install- Set up the environment variables in the .env file.
- Groq and Google are both free to use. OpenAI does require a paid account, so you would like to use this app without it, you can remove the OPENAI_API_KEY from the env.ts file and make other modifications to the codebase as necessary.
- The database URL is provided by Neon. You can sign up for a free account here.
OPENAI_API_KEY=""
GROQ_API_KEY=""
GOOGLE_API_KEY=""
DATABASE_URL=""
NODE_ENV="development"
NEXT_PUBLIC_APP_URL="http://localhost:3000"- Then, run the development server:
npm run dev- Open http://localhost:3000 with your browser to see the result.



