extract-text-app

This is a web app that allows users to upload PDF or image files and use the relative position of common anchor text present in the documents to extract other text in the document like the title, revision, or other metadata.

This is particularly useful for extracting data from engineering drawings whether it is printed or handwritten. Because it uses relative position to extract text, it can extract text from different sized documents reliably compared to Bluebeam's PDF extract tools for example. After extracting, you can then export the extracted text to a CSV.

Tech stack

Frontend: Svelte
Web Server: SvelteKit
API: Hasura
Database: PostgreSQL
Text extraction API: Azure AI OCR

Developing

Clone the repo
Download Docker Desktop
Open the repo in VSCode and get the Dev Containers extension and reopen the repo in a dev container.
Copy sample.env to .env and fill out your credentials for your Azure AI OCR endpoint.
Run npm install
Apply migrations and metadata. cd hasura && npx hasura migrate apply && npx hasura metadata apply; cd ..
Run npm run dev

Building

To create a production version of your app:

Set the VITE_GRAPHQL_ENDPOINT_HOST BEFORE building as it will be set to whatever the env var is at build time.
Build

npm run build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

extract-text-app

Tech stack

Developing

Building

Files

README.md

Latest commit

History

README.md

File metadata and controls

extract-text-app

Tech stack

Developing

Building