diff --git a/README.md b/README.md
index 59badb0..cafb688 100644
--- a/README.md
+++ b/README.md
@@ -2,18 +2,19 @@
-# The new Dale-Chall readability formula
+# Calculate the grade level of a text passage
-I wrote this by ordering a copy of _Readability Revisited: The new Dale-Chall readability formula_. I used the book to code the library from scratch.
+**Easily and accurately calculate a text's readability.**
-**Installation:**
+
+## Installation:
```bash
$ pip install new-dale-chall-readability
```
-**Let's try it out:**
+## Let's try it out:
```bash
$ ipython
@@ -30,23 +31,42 @@ In [2]: text = (
In [3]: reading_level(text)
Out[3]: '7-8'
+```
+_So it's grade 7–8 reading level._
+
+```python
In [4]: cloze_score(text)
Out[4]: 36.91
```
-## What's a "cloze score" and "reading level"?
+_And yep, the 36.91 cloze score says it's moderately difficult._
+
+So how is this useful? Well, here's one way:
+
+
+
+My legal dictionary orders entries like [amicus curiae](https://www.public.law/dictionary/entries/amicus-curiae) from simplest to most complex. I think it helps with comprehension and learning. I coded the numeric cloze score as the sort key.
-**Cloze** is a deletion test invented by Taylor (1953). The **36.91** score, above, means that roughly that 37% of the words could be deleted and the passage could still be understood. So, a
-higher cloze score is more readable. They "range from 58 and above for the easiest passages to 10-15 and below for the most difficult" (Chall & Dale, p. 75).
+
+
+## What's "reading level" and "cloze score"?
**Reading level** is the grade level of the material, in years of education. The scale is from
**1** to **16+**.
+**Cloze** is a deletion test invented by Taylor (1953). The `36.91` score, above, means that roughly that 37% of the words could be deleted and the passage could still be understood. So, **a
+higher cloze score is more readable**. They "range from 58 and above for the easiest passages to 10-15 and below for the most difficult" (Chall & Dale, p. 75).
+
See [the integration test file](https://github.com/public-law/new-dale-chall-readability/blob/master/tests/integration_test.py) for text samples from the book, along with their scores.
+## Why yet another readability library?
+
+Before creating this, I tried really hard to find a readability library that gave correct results, and also seemed to be a good algorithm. I realized I really like Dale-Chall. But I found show-stopping bugs in the existing libraries that cause them to give wrong answers.
+
+There are a ton of low-effort blog posts about Dale-Chall: they all seem to have different ideas about how it works. So I wrote this by first ordering a copy of _Readability Revisited: The new Dale-Chall readability formula_. Then I used the book to code the library from scratch. My goal was to create the best library I could for analyzing text. It needs to start with giving correct results. I did my best to rigorously design and test the code. And secondly, it needs to be modern Python code that's super easy to use. So, no objects to instantiate and no odd module naming. Just a couple of functions which can be called.
+
-## Why yet another Dale-Chall readability library?
It's 2022 and there are probably a half-dozen implementations on PyPI.
So why create another one?
diff --git a/new_dale_chall_readability/utils.py b/new_dale_chall_readability/utils.py
index 27f49da..45856d8 100644
--- a/new_dale_chall_readability/utils.py
+++ b/new_dale_chall_readability/utils.py
@@ -1,7 +1,13 @@
import re
+import warnings
+
from bs4 import BeautifulSoup
from .easy_words import EASY_WORDS as _EASY_WORDS
+# Ignore MarkupResemblesLocatorWarning and other user warnings
+# because this is library code.
+warnings.filterwarnings("ignore", category=UserWarning, module="bs4")
+
def pct_unfamiliar_words(text: str) -> float:
words = _words(text)
@@ -26,8 +32,6 @@ def _words(in_text: str) -> tuple[str, ...]:
def _is_unfamiliar(word: str) -> bool:
- match word:
- case number if re.match(r"\d+$", number):
- return False
- case _:
- return word not in _EASY_WORDS
+ if word.isdigit(): # Faster and simpler check for pure numbers
+ return False
+ return word not in _EASY_WORDS
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 3aad1d1..c80d03b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,22 +1,23 @@
[tool.poetry]
name = "new-dale-chall-readability"
-version = "1.0.12"
+version = "1.0.13"
description = "Implements the New Dale-Chall readability formula. Its output is tested against samples from the original publication."
authors = ["Robb Shecter "]
license = "MIT"
-homepage = "https://github.com/public-law/new-dale-chall-readability"
-repository = "https://github.com/public-law/new-dale-chall-readability"
+homepage = "https://github.com/public-law/readability"
+repository = "https://github.com/public-law/readability"
keywords = ["nlp", "readability", "dale-chall"]
classifiers = [
- "Development Status :: 5 - Production/Stable",
- "Intended Audience :: Developers",
- "License :: OSI Approved :: MIT License",
- "Natural Language :: English",
- "Operating System :: OS Independent",
- "Programming Language :: Python :: 3",
- "Programming Language :: Python :: 3.10",
- "Topic :: Text Processing :: Linguistic",
- "Typing :: Typed"]
+ "Development Status :: 5 - Production/Stable",
+ "Intended Audience :: Developers",
+ "License :: OSI Approved :: MIT License",
+ "Natural Language :: English",
+ "Operating System :: OS Independent",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.10",
+ "Topic :: Text Processing :: Linguistic",
+ "Typing :: Typed",
+]
readme = "README.md"
@@ -56,7 +57,7 @@ reportUnusedImport = "warning"
[tool.pytest.ini_options]
minversion = "7.1"
pythonpath = "."
-python_files = ["*_test.py",]
+python_files = ["*_test.py"]
python_classes = ["Test", "Describe"]
python_functions = ["test_", "it_", "and_", "but_", "they_"]
addopts = "-q --no-header --doctest-modules"