LLM Evaluation / Preference Measurement and Alignment / Interpretabilty

Overview

This repository contains a collection of presentation slides that I made for evaluating Large Language Models (LLMs), together with recent articles on LLM application evaluation. The slides provide a framework and measurement methods for model performance, evaluation metrics, and preference measurement methods.

Features

Intro to LLM
LLM model summary
Slide decks covering various aspects of LLM evaluation
Detailed explanations of evaluation methodologies
Links to LLM evaluation framework
Case studies and examples of model comparisons

Links to Relevant Posting

Evaluating Large Language Models: https://www.linkedin.com/posts/minha-hwang-7440771_ai-llm-machinelearning-activity-7297997796480032769-zAva?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
Machine Learning Meets Cognitive Science https://www.linkedin.com/posts/minha-hwang-7440771_tom-griffiths-on-using-machine-learning-and-activity-7296558609356701697-j2P9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
NOLIMA: Long-Context Evaluation Beyond Literal Matching https://www.linkedin.com/posts/minha-hwang-7440771_nolima-long-context-evaluation-beyond-activity-7295798550905425920-4Joh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
LLM Evaluation - Measurement Scale Options https://www.linkedin.com/posts/minha-hwang-7440771_llm-evaluation-measurement-scale-options-activity-7295417284582330368-PXS-?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
Large Language Model Summary - Curated list of Large Language Model overviews https://www.linkedin.com/posts/minha-hwang-7440771_large-language-model-summary-activity-7294046954915774464-H6Mq?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
The Resurgence of Survey Research in the AI Era https://www.linkedin.com/posts/minha-hwang-7440771_ai-humandata-alignment-activity-7292901672811405312-cjZf?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
Perplexity LLM Evaluation https://www.linkedin.com/feed/update/urn:li:activity:7292455186516623361?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
LLM as a judge: Search Engine Ranking Relevance https://www.linkedin.com/posts/minha-hwang-7440771_large-language-models-can-accurately-predict-activity-7287267842343813121-57is?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
Comparing Traditional and LLM-based Search for Consumer Choice https://www.linkedin.com/posts/minha-hwang-7440771_comparing-traditional-and-llm-based-search-activity-7287260733388529665-HmrK?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
Interpretable User Satisfaction Estimation for Conversational SYstem with LLM https://www.linkedin.com/posts/minha-hwang-7440771_240312388-activity-7281121918601121792-7rRh?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo
How to Evaluate LLMs in Production: A Complete Metric Framework https://www.linkedin.com/posts/minha-hwang-7440771_how-to-evaluate-llms-a-complete-metric-framework-activity-7278732750956675072-4yts?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAv-mQBjDr9Jz8JMW5kF3_ogzmqhJHjGXo

Usage

Download the slides from the repository.
Open with Adobe PDF, PowerPoint, Google Slides, or any compatible viewer.
Use the slides for presentations, research, or internal analysis.
Modify or extend the slides as needed for your specific use case.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch (feature-name or bugfix-name).
Add or update slide decks.
Commit your changes and push the branch.
Open a Pull Request with a detailed description.

License

This project is licensed under the MIT License. See LICENSE for more details.

Contact

For inquiries or support, open an issue or reach out to minha.hwang@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
250213.Embedding_Vectors.pptx		250213.Embedding_Vectors.pptx
AI_CODING_LLM_PROMPT_ENGINEERING.pdf		AI_CODING_LLM_PROMPT_ENGINEERING.pdf
LLM_Evaluation.pdf		LLM_Evaluation.pdf
LLM_Models.pdf		LLM_Models.pdf
LLM_RL_CONNECTION.pdf		LLM_RL_CONNECTION.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Evaluation / Preference Measurement and Alignment / Interpretabilty

Overview

Features

Links to Relevant Posting

Usage

Contributing

License

Contact

About

Releases

Packages

DrSquare/LLM_Evaluation

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluation / Preference Measurement and Alignment / Interpretabilty

Overview

Features

Links to Relevant Posting

Usage

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages