In today's digital age, information accessibility is of paramount importance. However, a significant barrier exists for individuals with varying levels of language proficiency when encountering complex texts. To address this challenge, the field of text simplification has emerged, aiming to create versions of text that are easier to comprehend while retaining essential information. The project aims to evaluate the efficacy of simplified text translations.
Have a look on the training dataset and the text metrics in the EDA notebook.
The model was trained on the parallel corpus from:
The model training notebook and comparison of different machine learning algorithms (Logistic Regression, Random Forest, GaussianNB).
The training script and the final model can be found in the model folder.
Create a Python virtual environment:
$ python -m venv .env
Activate the Python virtual environment:
$ source .env/bin/activate
Install the required dependencies:
(.env) $ pip install -r requirements.txt
Run the Flask app in your local environment with:
(.env) $ python flask/app.py
Post the example texts and get the evaluation for simplification success:
curl -X POST -H "Content-Type: application/json" -d '{
"examples": [
"Während der aktuellen COVID-19-Pandemie ist die schnelle Verfügbarkeit fundierter Informationen von entscheidender Bedeutung, um Informationen über Diagnose, Krankheitsverlauf, Behandlung abzuleiten oder die Verhaltensregeln in der Öffentlichkeit anzupassen.",
"Während der aktuellen COVID-19-Pandemie ist es wichtig, dass wir schnell an fundierte Informationen gelangen können. Diese Informationen sind entscheidend, um mehr über die Diagnose, den Krankheitsverlauf und die Behandlung zu erfahren. Außerdem helfen sie dabei, die Verhaltensregeln in der Öffentlichkeit anzupassen."
]
}' http://127.0.0.1:5000/predict
This example text is cited from:
Langnickel, Lisa and Baum, Roman and Darms, Johannes and Madan, Sumit and Fluck, Juliane. COVID-19 preVIEW: Semantic Search to Explore COVID-19 Research Preprints. 2021. DOI: 10.3233/SHTI210124
Run the streamlit app in your local environment with:
(.env) $ python -m streamlit run streamlit/main.py
To create the docker image:
(.env) $ docker build -t flask-app -f Dockerfile_Flask .
To run the docker image:
(.env) $ docker run -p 5000:5000 flask-app
To create the docker image:
(.env) $ docker build -t streamlit-app -f Dockerfile_Streamlit .
To run the docker image:
(.env) $ docker run -p 8501:8501 streamlit-app
https://text-simplification-evaluation.streamlit.app/
The example text that was translated with ChatGPT into easy language originally is cited from:
Langnickel, Lisa and Baum, Roman and Darms, Johannes and Madan, Sumit and Fluck, Juliane. COVID-19 preVIEW: Semantic Search to Explore COVID-19 Research Preprints. 2021. DOI: 10.3233/SHTI210124
The model was trained on the parallel corpus from: