Skip to content

Releases: ritwickbhargav80/quick-llm-model-evaluations

Simultaneous Evaluation with Multiple LLMs Support

29 Aug 16:10
Compare
Choose a tag to compare

This release introduces a major update to our Streamlit app, now featuring the capability to perform evaluations with multiple LLMs simultaneously. Users can now select and evaluate multiple models in one go, expanding the flexibility and depth of evaluations.

What's New:
- Multiple LLM Evaluations Simultaneously: Evaluate multiple models in one go, providing a more comprehensive analysis.
- Color Coding: Selected LLM models are now highlighted with background colors corresponding to their providers. In order to differentiate the models from different LLM provides during Multiple LLM Evaluations.

Supported Sources:

  • URL
  • YouTube
  • PDF
  • DOCX

Supported LLMs:

  • Gemini
    • gemini-1.0-pro
    • gemini-pro
    • gemini-1.5-pro-latest
  • OpenAI
    • gpt-3.5-turbo
    • gpt-4
    • gpt-4-turbo
    • gpt-3.5-turbo-16k
  • Azure OpenAI
    • gpt-35-turbo
    • gpt-4
    • gpt-35-turbo-16k
  • Anthropic
    • claude-3-5-sonnet-20240620
    • claude-3-haiku-20240307
    • claude-3-sonnet-20240229
    • claude-3-opus-20240229
  • Groq
    • mixtral-8x7b-32768
    • gemma2-9b-it
    • llama-3.1-8b-instant
    • llama3-70b-8192
    • llama3-8b-8192
    • llama3-groq-70b-8192-tool-use-preview
    • llama3-groq-8b-8192-tool-use-preview
  • Hugging Face
    • huggingfaceh4/zephyr-7b-alpha
    • huggingfaceh4/zephyr-7b-beta

Evaluation Metrics:

  • Context Relevancy
  • Answer Relevancy
  • Groundedness

This release enhances the app's functionality by allowing simultaneous evaluations across multiple LLMs, providing a more comprehensive analysis. Future updates will continue to focus on improving the user experience and adding new features.

Full Changelog: v1.0.0...v1.1.0

v1.0.0 - Initial Release: Single LLM Model Evaluation with BeyondLLM

28 Aug 14:33
05dad93
Compare
Choose a tag to compare

This release introduces the initial version of our Streamlit app for single-model evaluations using beyondllm. Users need to provide input such as source related information, and credentials for large language models (LLMs). The app also supports Retrieval-Augmented Generation (RAG) and provides comprehensive evaluation metrics.

Supported Sources:

  • URL
  • YouTube
  • PDF
  • DOCX

Supported LLMs:

  • Gemini
    • gemini-1.0-pro
    • gemini-pro
    • gemini-1.5-pro-latest
  • OpenAI
    • gpt-3.5-turbo
    • gpt-4
    • gpt-4-turbo
  • Azure OpenAI
    • gpt-35-turbo
    • gpt-35-turbo-16k
    • gpt-4
  • Anthropic
    • claude-3-sonnet-20240229
    • claude-3-haiku-20240307
    • claude-3-opus-20240229
    • claude-3-5-sonnet-20240620

Evaluation Metrics:

  • Context Relevancy
  • Answer Relevancy
  • Groundedness

This release marks the first stable version of the app, tested for a smooth user experience across different models and sources. Future updates will include extended features and further optimizations.

Full Changelog: https://github.com/ritwickbhargav80/quick-llm-model-evaluations/commits/v1.0.0