pixelating-ocr

Repository Contents

docs_ocr.gs: Google Apps script for extracting text from a batch of JPEG files
sample_items: contains jpeg images from Mother Tongue Pipal Pustak and Nepali Aawaz. Made available by the Digital Himalaya Project under Attribution-NonCommercial-NoDerivs 3.0 Unported license.

Google Drive Batch OCR Script (docs_ocr.gs)

This script was created to generate transcripts from images featuring Nepali & Tibetan language text. Finds all JPEG files within the specified Google Drive folder, opens them as Google Docs, and exports their filenames and text contents to the specified Google Sheet. (Uploaded JPEGs are deleted from Drive in the process; Corresponding Google docs remain.)

Setup

Create a new folder for your JPEG files. Keep track of the folder's name for step 6.
Create a new Google Sheet in the same folder. This will store your transcript text.
Copy the id found in the sheet's url (look for the the long string of letters and numbers between 'd/' and '/edit'). Hold onto it for step 7.
Under the 'Tools' menu, select 'Script Editor'.
Paste the contents of 'docs_ocr.gs' into the script editor.
Update 'folderName' with the name of your image folder (see step 1).
Update 'sheetId' with the id associated with your transcript sheet (see step 3).
Click the clock icon to add a trigger. Select options "extractTextOnOpen", "From Spreadsheet", and "on open". This will tell the script to run whenever someone opens the spreadsheet.

Usage

Upload jpegs to the folder you set up.
Open up the spreadsheet.
Make a cup of coffee/tea and relax while Google converts the jpegs, extracts text, and populates the spreadsheet.

Troubleshooting

'Google Drive: Page not found' : Make sure you're only logged into one Google account (see Stack Overflow)

Credits

Research: Laura Ferris, Digital Initiatives Assistant, UBC Library

Code: Rebecca Dickson, Digital Projects Student Librarian, UBC Library

Inspiration: http://blogs.bl.uk/digital-scholarship/2017/07/a-workshop-on-optical-character-recognition-for-bangla.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pixelating-ocr

Related Presentations

Repository Contents

Google Drive Batch OCR Script (docs_ocr.gs)

Setup

Usage

Troubleshooting

Credits

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
sample_items		sample_items
README.md		README.md
docs_ocr.gs		docs_ocr.gs

rob0tca/pixelating-ocr

Folders and files

Latest commit

History

Repository files navigation

pixelating-ocr

Related Presentations

Repository Contents

Google Drive Batch OCR Script (docs_ocr.gs)

Setup

Usage

Troubleshooting

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages