Releases: ritwickbhargav80/quick-llm-model-evaluations
Simultaneous Evaluation with Multiple LLMs Support
This release introduces a major update to our Streamlit app, now featuring the capability to perform evaluations with multiple LLMs simultaneously. Users can now select and evaluate multiple models in one go, expanding the flexibility and depth of evaluations.
What's New:
- Multiple LLM Evaluations Simultaneously: Evaluate multiple models in one go, providing a more comprehensive analysis.
- Color Coding: Selected LLM models are now highlighted with background colors corresponding to their providers. In order to differentiate the models from different LLM provides during Multiple LLM Evaluations.
Supported Sources:
- URL
- YouTube
- DOCX
Supported LLMs:
- Gemini
- gemini-1.0-pro
- gemini-pro
- gemini-1.5-pro-latest
- OpenAI
- gpt-3.5-turbo
- gpt-4
- gpt-4-turbo
- gpt-3.5-turbo-16k
- Azure OpenAI
- gpt-35-turbo
- gpt-4
- gpt-35-turbo-16k
- Anthropic
- claude-3-5-sonnet-20240620
- claude-3-haiku-20240307
- claude-3-sonnet-20240229
- claude-3-opus-20240229
- Groq
- mixtral-8x7b-32768
- gemma2-9b-it
- llama-3.1-8b-instant
- llama3-70b-8192
- llama3-8b-8192
- llama3-groq-70b-8192-tool-use-preview
- llama3-groq-8b-8192-tool-use-preview
- Hugging Face
- huggingfaceh4/zephyr-7b-alpha
- huggingfaceh4/zephyr-7b-beta
Evaluation Metrics:
- Context Relevancy
- Answer Relevancy
- Groundedness
This release enhances the app's functionality by allowing simultaneous evaluations across multiple LLMs, providing a more comprehensive analysis. Future updates will continue to focus on improving the user experience and adding new features.
Full Changelog: v1.0.0...v1.1.0
v1.0.0 - Initial Release: Single LLM Model Evaluation with BeyondLLM
This release introduces the initial version of our Streamlit app for single-model evaluations using beyondllm. Users need to provide input such as source related information, and credentials for large language models (LLMs). The app also supports Retrieval-Augmented Generation (RAG) and provides comprehensive evaluation metrics.
Supported Sources:
- URL
- YouTube
- DOCX
Supported LLMs:
- Gemini
- gemini-1.0-pro
- gemini-pro
- gemini-1.5-pro-latest
- OpenAI
- gpt-3.5-turbo
- gpt-4
- gpt-4-turbo
- Azure OpenAI
- gpt-35-turbo
- gpt-35-turbo-16k
- gpt-4
- Anthropic
- claude-3-sonnet-20240229
- claude-3-haiku-20240307
- claude-3-opus-20240229
- claude-3-5-sonnet-20240620
Evaluation Metrics:
- Context Relevancy
- Answer Relevancy
- Groundedness
This release marks the first stable version of the app, tested for a smooth user experience across different models and sources. Future updates will include extended features and further optimizations.
Full Changelog: https://github.com/ritwickbhargav80/quick-llm-model-evaluations/commits/v1.0.0