This release introduces a major update to our Streamlit app, now featuring the capability to perform evaluations with multiple LLMs simultaneously. Users can now select and evaluate multiple models in one go, expanding the flexibility and depth of evaluations.
What's New:
- Multiple LLM Evaluations Simultaneously: Evaluate multiple models in one go, providing a more comprehensive analysis.
- Color Coding: Selected LLM models are now highlighted with background colors corresponding to their providers. In order to differentiate the models from different LLM provides during Multiple LLM Evaluations.
Supported Sources:
- URL
- YouTube
- DOCX
Supported LLMs:
- Gemini
- gemini-1.0-pro
- gemini-pro
- gemini-1.5-pro-latest
- OpenAI
- gpt-3.5-turbo
- gpt-4
- gpt-4-turbo
- gpt-3.5-turbo-16k
- Azure OpenAI
- gpt-35-turbo
- gpt-4
- gpt-35-turbo-16k
- Anthropic
- claude-3-5-sonnet-20240620
- claude-3-haiku-20240307
- claude-3-sonnet-20240229
- claude-3-opus-20240229
- Groq
- mixtral-8x7b-32768
- gemma2-9b-it
- llama-3.1-8b-instant
- llama3-70b-8192
- llama3-8b-8192
- llama3-groq-70b-8192-tool-use-preview
- llama3-groq-8b-8192-tool-use-preview
- Hugging Face
- huggingfaceh4/zephyr-7b-alpha
- huggingfaceh4/zephyr-7b-beta
Evaluation Metrics:
- Context Relevancy
- Answer Relevancy
- Groundedness
This release enhances the app's functionality by allowing simultaneous evaluations across multiple LLMs, providing a more comprehensive analysis. Future updates will continue to focus on improving the user experience and adding new features.
Full Changelog: v1.0.0...v1.1.0