Skip to content

DrSquare/LLM_Evaluation

Repository files navigation

LLM Evaluation / Preference Measurement and Alignment / Interpretabilty

Overview

This repository contains a collection of presentation slides that I made for evaluating Large Language Models (LLMs), together with recent articles on LLM application evaluation. The slides provide a framework and measurement methods for model performance, evaluation metrics, and preference measurement methods.

Features

  • Intro to LLM
  • LLM model summary
  • Slide decks covering various aspects of LLM evaluation
  • Detailed explanations of evaluation methodologies
  • Links to LLM evaluation framework
  • Case studies and examples of model comparisons

Links to Relevant Posting

  1. Evaluating Large Language Models: https://www.linkedin.com/posts/minha-hwang-7440771_ai-llm-machinelearning-activity-7297997796480032769-zAva?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  2. Machine Learning Meets Cognitive Science https://www.linkedin.com/posts/minha-hwang-7440771_tom-griffiths-on-using-machine-learning-and-activity-7296558609356701697-j2P9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  3. NOLIMA: Long-Context Evaluation Beyond Literal Matching https://www.linkedin.com/posts/minha-hwang-7440771_nolima-long-context-evaluation-beyond-activity-7295798550905425920-4Joh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  4. LLM Evaluation - Measurement Scale Options https://www.linkedin.com/posts/minha-hwang-7440771_llm-evaluation-measurement-scale-options-activity-7295417284582330368-PXS-?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  5. Large Language Model Summary - Curated list of Large Language Model overviews https://www.linkedin.com/posts/minha-hwang-7440771_large-language-model-summary-activity-7294046954915774464-H6Mq?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  6. The Resurgence of Survey Research in the AI Era https://www.linkedin.com/posts/minha-hwang-7440771_ai-humandata-alignment-activity-7292901672811405312-cjZf?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  7. Perplexity LLM Evaluation https://www.linkedin.com/feed/update/urn:li:activity:7292455186516623361?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  8. LLM as a judge: Search Engine Ranking Relevance https://www.linkedin.com/posts/minha-hwang-7440771_large-language-models-can-accurately-predict-activity-7287267842343813121-57is?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  9. Comparing Traditional and LLM-based Search for Consumer Choice https://www.linkedin.com/posts/minha-hwang-7440771_comparing-traditional-and-llm-based-search-activity-7287260733388529665-HmrK?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  10. Interpretable User Satisfaction Estimation for Conversational SYstem with LLM https://www.linkedin.com/posts/minha-hwang-7440771_240312388-activity-7281121918601121792-7rRh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
  11. How to Evaluate LLMs in Production: A Complete Metric Framework https://www.linkedin.com/posts/minha-hwang-7440771_how-to-evaluate-llms-a-complete-metric-framework-activity-7278732750956675072-4yts?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo

Usage

  1. Download the slides from the repository.
  2. Open with Adobe PDF, PowerPoint, Google Slides, or any compatible viewer.
  3. Use the slides for presentations, research, or internal analysis.
  4. Modify or extend the slides as needed for your specific use case.

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch (feature-name or bugfix-name).
  3. Add or update slide decks.
  4. Commit your changes and push the branch.
  5. Open a Pull Request with a detailed description.

License

This project is licensed under the MIT License. See LICENSE for more details.

Contact

For inquiries or support, open an issue or reach out to minha.hwang@gmail.com.

About

LLM Evaluation Method and Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published