This repository contains a collection of presentation slides that I made for evaluating Large Language Models (LLMs), together with recent articles on LLM application evaluation. The slides provide a framework and measurement methods for model performance, evaluation metrics, and preference measurement methods.
- Intro to LLM
- LLM model summary
- Slide decks covering various aspects of LLM evaluation
- Detailed explanations of evaluation methodologies
- Links to LLM evaluation framework
- Case studies and examples of model comparisons
- Evaluating Large Language Models: https://www.linkedin.com/posts/minha-hwang-7440771_ai-llm-machinelearning-activity-7297997796480032769-zAva?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Machine Learning Meets Cognitive Science https://www.linkedin.com/posts/minha-hwang-7440771_tom-griffiths-on-using-machine-learning-and-activity-7296558609356701697-j2P9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- NOLIMA: Long-Context Evaluation Beyond Literal Matching https://www.linkedin.com/posts/minha-hwang-7440771_nolima-long-context-evaluation-beyond-activity-7295798550905425920-4Joh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- LLM Evaluation - Measurement Scale Options https://www.linkedin.com/posts/minha-hwang-7440771_llm-evaluation-measurement-scale-options-activity-7295417284582330368-PXS-?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Large Language Model Summary - Curated list of Large Language Model overviews https://www.linkedin.com/posts/minha-hwang-7440771_large-language-model-summary-activity-7294046954915774464-H6Mq?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- The Resurgence of Survey Research in the AI Era https://www.linkedin.com/posts/minha-hwang-7440771_ai-humandata-alignment-activity-7292901672811405312-cjZf?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Perplexity LLM Evaluation https://www.linkedin.com/feed/update/urn:li:activity:7292455186516623361?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- LLM as a judge: Search Engine Ranking Relevance https://www.linkedin.com/posts/minha-hwang-7440771_large-language-models-can-accurately-predict-activity-7287267842343813121-57is?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Comparing Traditional and LLM-based Search for Consumer Choice https://www.linkedin.com/posts/minha-hwang-7440771_comparing-traditional-and-llm-based-search-activity-7287260733388529665-HmrK?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Interpretable User Satisfaction Estimation for Conversational SYstem with LLM https://www.linkedin.com/posts/minha-hwang-7440771_240312388-activity-7281121918601121792-7rRh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- How to Evaluate LLMs in Production: A Complete Metric Framework https://www.linkedin.com/posts/minha-hwang-7440771_how-to-evaluate-llms-a-complete-metric-framework-activity-7278732750956675072-4yts?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
- Download the slides from the repository.
- Open with Adobe PDF, PowerPoint, Google Slides, or any compatible viewer.
- Use the slides for presentations, research, or internal analysis.
- Modify or extend the slides as needed for your specific use case.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
feature-name
orbugfix-name
). - Add or update slide decks.
- Commit your changes and push the branch.
- Open a Pull Request with a detailed description.
This project is licensed under the MIT License. See LICENSE
for more details.
For inquiries or support, open an issue or reach out to minha.hwang@gmail.com
.