-
Notifications
You must be signed in to change notification settings - Fork 37
Adding a document language
Adam Hooper edited this page Jul 31, 2017
·
3 revisions
Here are the steps we took to add Norwegian document support:
- Make sure it installs on new AWS deploys: in aws-overview-tools/
cloud-init/conglomerate.txt
, add thetesseract-ocr-nor
package. - Actually install the package on a production server:
sudo apt-get install tesseract-ocr-nor
. - Make sure new Docker images include the package: add
tesseract-ocr-nor
to overview-docker/worker/Dockerfile
. - Now edit code in Overview:
- Add
NO
andnor
tocommon/src/main/scala/com/overviewdocs/util/SupportedLanguages.scala
- Add
worker/src/main/resources/stopwords-no.csv
so the Tree analysis knows which words to ignore
- Add
- Commit and deploy and release the Docker images