Introduction

PDF to JSON-ified Text Converter with Efficient Wiktionary Search

Introduction

This repository contains a web application that allows users to upload their own PDF files and converts them into JSON-ified text. The PDF is sent to a Python Flask server hosted on Render.com. The server efficiently processes the PDF, stripping it down to individual words and then finding the basic version of each word by making requests to Wiktionary. The resulting JSON-ified text of the PDF, along with a dictionary containing each word and its basic form, is then stored in a MongoDB database.

Key Features

PDF Upload and Conversion: Users can easily upload their PDF files through the web interface, and the Flask server handles the conversion process.
Efficient Word Processing: The Flask server smartly processes the PDF, breaking it down into individual words and then searching for the basic form of each word on Wiktionary. This approach minimizes unnecessary requests and optimizes the overall performance.
Data Storage: The JSON-ified text of the PDF and the associated dictionary (word and its basic form) are securely saved to a MongoDB database, providing a scalable and flexible solution for data management.
User-Friendly Interface: The website allows users to view all uploaded files in their JSON-ified form, providing a clear and organized representation of the converted content.
On-Demand Word Description: Each uploaded file comes with an associated dictionary generated by the Flask server. This enables the backend to make accurate and real-time searches in the Wiktionary dump for full descriptions of any word the user clicks on.

Repository Link

Check out the live application at http://jakubgrad.ddns.net:2227/frontend/about and the source code on Github. Feel free to explore the codebase and contribute to the project. The live application currently runs on my private server.

Optimizing Wiktionary Search

The hybrid approach of the current application has serious disadvantages, especially in terms of memory efficiency and speed when making requests to Wiktionary for each word in a PDF. To address this problem, I'm in the process of implementing a better approach:

Using a Flask Python server with direct access to the Wiktionary dump.

This new approach will allow a more efficient search for the basic form of a word in the Wiktionary dump without the need for additional searches for the full description. By directly accessing the relevant information, valuable processing time will be saved and the overall performance of the application will enhance.

Feedback

If you have any suggestions, ideas, or would like to contribute, feel free to open an issue or submit a pull request!

Thank you for your interest in the project!

Is it a serious approach to create a language reading app? Semi-serious. The UI definitely needs improvement, and so does the method for finding words. The current speed of about a minute for 1 page of a PDF is prohibitively slow, and there is no personalization for users of the website, like saving words or marking progress in a PDF. Though this might come soon, there is also no UI option for flipping pages o_o

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Frontend		Frontend
Python		Python
Test pdfs		Test pdfs
build		build
controllers		controllers
models		models
node_modules		node_modules
requests		requests
tests		tests
utils		utils
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
NextCommitLog.txt		NextCommitLog.txt
README.md		README.md
app.js		app.js
index.js		index.js
mongo.js		mongo.js
package-lock.json		package-lock.json
package.json		package.json
pipeline_command.txt		pipeline_command.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Key Features

Repository Link

Optimizing Wiktionary Search

Feedback

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jakubgrad/LangApp23

Folders and files

Latest commit

History

Repository files navigation

Introduction

Key Features

Repository Link

Optimizing Wiktionary Search

Feedback

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages