The goal is to create a tool that not only assists in overcoming the challenges posed by learning and attention issues but also enhances the overall learning experience for all users.
We collected open source articles from 'News in Levels' and 'Wikipedia'/ 'Simple Wikipedia,' as well as text from 'OneStopEnglish' research dataset.
These sources have the same text in multiple reading levels, which we define with the Common European Framework of Reference for Languages (CEFR).
There are 6 CEFR levels, but we mapped our data to 3 major levels: CEFR C-B-A corresponding to Advanced-Intermediate-Beginner.
Our model first classifies texts into the predefined CEFR levels and then simplifies the content to match the desired reading level. We also flag if a text has Unsafe Text, including profane language and hate speech.
For the robust evaluation of the tool’s performance, we’ve incorporated several methods:
- CEFR (Common European Framework of Reference for Languages): Using our classifier, we generate labels for the produced text and juxtapose it against the ground truth from our evaluation set.
- Aggregate of Gunning Fog Index, Flesch Kincaid Reading Ease score, and Dale Chall Readability Score from python's
textstat
library: An aggregate of the following metrics is used to measure the complexity of the produced text.- Gunning Fog Index: Evaluates readability based on complex words (>3 syllables) density and average sentence length
- Flesch Kincaid Reading Ease score: Evaluates readability based on average syllables per word and average sentence length
- Dale Chall Readability Score: Evaluates readability based on average sentence length and frequency of difficult words (words that are not present in a list of 3000 easy words)
- GPT-4 Score: GPT-4 is asked to rate the complexity of the output text on a scale of 1-100.
Overall, we see that the fine-tuned Llama-2 7b chat model performs best for the simplification task. Comparison with the results of the out-of-box model shows that our fine-tuning greatly improved the quality of the generated text for the task.
UI created and deployed with Streamlit. User's input text is classified as a reading level, seen above in "Input text is at [Advanced] Level." User can then choose to simplify the text to a Beginner or Intermediate level in the "Simplify to:" option.
Inclusivity features included are bionic reading, text-to-speech, font display adjustment, and PDF download.
- Access the tool via our web portal.
- Paste or type in the content you wish to simplify.
- Select the desired readability level.
- Adjust display and format.
- View the simplified content.
- (Optional) Provide feedback for continuous model improvement.
Getting Started for Developers:
- Clone the GitHub repository.
- Ensure all dependencies are installed.
- For local testing, run the Streamlit app.
- For deploying on your server, modify the necessary configuration settings.
Team:
- Ankita Nambiar
- Egehan Yorulmaz
- Lavanya Srivastava
- Prayut Jain
Conversational AI with Nick Kadochnikov @ University of Chicago M.S. in Applied Data Science
Contributions, feedback, and improvements are always welcome. Feel free to submit pull requests or raise issues.
This project is licensed under the MIT License. Refer to the LICENSE
file for more details.
Keep It Simple. Making Information Accessible with AI.