Skip to content

Adding a document language

Adam Hooper edited this page Jul 31, 2017 · 3 revisions

Here are the steps we took to add Norwegian document support:

  1. Make sure it installs on new AWS deploys: in aws-overview-tools/cloud-init/conglomerate.txt, add the tesseract-ocr-nor package.
  2. Actually install the package on a production server: sudo apt-get install tesseract-ocr-nor.
  3. Make sure new Docker images include the package: add tesseract-ocr-nor to overview-docker/worker/Dockerfile.
  4. Now edit code in Overview:
    • Add NO and nor to common/src/main/scala/com/overviewdocs/util/SupportedLanguages.scala
    • Add worker/src/main/resources/stopwords-no.csv so the Tree analysis knows which words to ignore
  5. Commit and deploy and release the Docker images
Clone this wiki locally