Skip to content

rob0tca/pixelating-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

pixelating-ocr

Related Presentations

Repository Contents

Google Drive Batch OCR Script (docs_ocr.gs)

This script was created to generate transcripts from images featuring Nepali & Tibetan language text. Finds all JPEG files within the specified Google Drive folder, opens them as Google Docs, and exports their filenames and text contents to the specified Google Sheet. (Uploaded JPEGs are deleted from Drive in the process; Corresponding Google docs remain.)

Setup

  1. Create a new folder for your JPEG files. Keep track of the folder's name for step 6.
  2. Create a new Google Sheet in the same folder. This will store your transcript text.
  3. Copy the id found in the sheet's url (look for the the long string of letters and numbers between 'd/' and '/edit'). Hold onto it for step 7.
  4. Under the 'Tools' menu, select 'Script Editor'.
  5. Paste the contents of 'docs_ocr.gs' into the script editor.
  6. Update 'folderName' with the name of your image folder (see step 1).
  7. Update 'sheetId' with the id associated with your transcript sheet (see step 3).
  8. Click the clock icon to add a trigger. Select options "extractTextOnOpen", "From Spreadsheet", and "on open". This will tell the script to run whenever someone opens the spreadsheet.

Usage

  1. Upload jpegs to the folder you set up.
  2. Open up the spreadsheet.
  3. Make a cup of coffee/tea and relax while Google converts the jpegs, extracts text, and populates the spreadsheet.

Troubleshooting

  • 'Google Drive: Page not found' : Make sure you're only logged into one Google account (see Stack Overflow)

Credits

Research: Laura Ferris, Digital Initiatives Assistant, UBC Library

Code: Rebecca Dickson, Digital Projects Student Librarian, UBC Library

Inspiration: http://blogs.bl.uk/digital-scholarship/2017/07/a-workshop-on-optical-character-recognition-for-bangla.html

About

Materials for use in UBC Library Pixelating Workshop, 11/02/2017

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published