extract-text-app

This is a web app that allows users to upload PDF or image files and use the relative position of common anchor text present in the documents to extract other text in the document like the title, revision, or other metadata.

This is particularly useful for extracting data from engineering drawings whether it is printed or handwritten. Because it uses relative position to extract text, it can extract text from different sized documents reliably compared to Bluebeam's PDF extract tools for example. After extracting, you can then export the extracted text to a CSV.

Tech stack

Frontend: Svelte
Web Server: SvelteKit
API: Hasura
Database: PostgreSQL
Text extraction API: Azure AI OCR

Developing

Clone the repo
Download Docker Desktop
Open the repo in VSCode and get the Dev Containers extension and reopen the repo in a dev container.
Copy sample.env to .env and fill out your credentials for your Azure AI OCR endpoint.
Run npm install
Apply migrations and metadata. cd hasura && npx hasura migrate apply && npx hasura metadata apply; cd ..
Run npm run dev

Building

To create a production version of your app:

Set the VITE_GRAPHQL_ENDPOINT_HOST BEFORE building as it will be set to whatever the env var is at build time.
Build

npm run build

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
hasura		hasura
python		python
src		src
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.graphqlrc.yaml		.graphqlrc.yaml
.npmrc		.npmrc
Dockerfile		Dockerfile
README.md		README.md
eslint.config.js		eslint.config.js
houdini.config.js		houdini.config.js
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.js		postcss.config.js
sample.env		sample.env
schema.graphql		schema.graphql
svelte.config.js		svelte.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

extract-text-app

Tech stack

Developing

Building

About

Releases

Packages

Languages

lectrician1/extract-text-app

Folders and files

Latest commit

History

Repository files navigation

extract-text-app

Tech stack

Developing

Building

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages