Skip to content

A web application that allows you to see a full Wiktionary description of every word in a provided pdf instantly!

Notifications You must be signed in to change notification settings

jakubgrad/LangApp23

Repository files navigation

PDF to JSON-ified Text Converter with Efficient Wiktionary Search

Introduction

This repository contains a web application that allows users to upload their own PDF files and converts them into JSON-ified text. The PDF is sent to a Python Flask server hosted on Render.com. The server efficiently processes the PDF, stripping it down to individual words and then finding the basic version of each word by making requests to Wiktionary. The resulting JSON-ified text of the PDF, along with a dictionary containing each word and its basic form, is then stored in a MongoDB database.

Key Features

  • PDF Upload and Conversion: Users can easily upload their PDF files through the web interface, and the Flask server handles the conversion process.

  • Efficient Word Processing: The Flask server smartly processes the PDF, breaking it down into individual words and then searching for the basic form of each word on Wiktionary. This approach minimizes unnecessary requests and optimizes the overall performance.

  • Data Storage: The JSON-ified text of the PDF and the associated dictionary (word and its basic form) are securely saved to a MongoDB database, providing a scalable and flexible solution for data management.

  • User-Friendly Interface: The website allows users to view all uploaded files in their JSON-ified form, providing a clear and organized representation of the converted content.

  • On-Demand Word Description: Each uploaded file comes with an associated dictionary generated by the Flask server. This enables the backend to make accurate and real-time searches in the Wiktionary dump for full descriptions of any word the user clicks on.

Repository Link

Check out the live application at http://jakubgrad.ddns.net:2227/frontend/about and the source code on Github. Feel free to explore the codebase and contribute to the project. The live application currently runs on my private server.

Optimizing Wiktionary Search

The hybrid approach of the current application has serious disadvantages, especially in terms of memory efficiency and speed when making requests to Wiktionary for each word in a PDF. To address this problem, I'm in the process of implementing a better approach:

  • Using a Flask Python server with direct access to the Wiktionary dump.

This new approach will allow a more efficient search for the basic form of a word in the Wiktionary dump without the need for additional searches for the full description. By directly accessing the relevant information, valuable processing time will be saved and the overall performance of the application will enhance.

Feedback

If you have any suggestions, ideas, or would like to contribute, feel free to open an issue or submit a pull request!

Thank you for your interest in the project!

  • Is it a serious approach to create a language reading app? Semi-serious. The UI definitely needs improvement, and so does the method for finding words. The current speed of about a minute for 1 page of a PDF is prohibitively slow, and there is no personalization for users of the website, like saving words or marking progress in a PDF. Though this might come soon, there is also no UI option for flipping pages o_o

About

A web application that allows you to see a full Wiktionary description of every word in a provided pdf instantly!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published