Skip to content

alexteghipco/LLMReadabilityBenchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

LLM Readability Benchmark

Human readability judgments as a benchmark for LLMs

Ever wonder which of the many Large language models (LLMs) you can use have the best understanding of what makes text 'readable'? Me too! Here, we will benchmark (prompt) a wide range of large language models on their ability to reproduce human judgments about the readability of short fictional and nonfictional passages. We'll test different models on different prompts, finding the models that tend to do best.

The attached notebook lays out a series of informal experiments comparing many open and closed LLMs.

The result: small open source models that you can run locally are competitive with GPT4o-Turbo and GPT4o-mini, particularly Google's newly released Gemma2 (2B model performed just about as well as the 27B model across prompts). Here's a slice of some of the tested models (GPT4oTurbo did as well as mini):

About

Human readability judgments as a benchmark for LLMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published