Skip to content

Releases: dataiku/dss-plugin-tesseract-ocr

Version 2.3.3

18 Apr 09:16
8d5dd3c
Compare
Choose a tag to compare

Add support for .tif extension for OCR

Version 2.3.2

08 Mar 13:20
2a0b13e
Compare
Choose a tag to compare

Fix reading temporary file for pypandoc conversion

Version 2.3.1

22 Jan 10:59
2562173
Compare
Choose a tag to compare

Fix text extraction from html files with line wraps when chunking

Version 2.3.0

20 Dec 10:41
68f45e3
Compare
Choose a tag to compare
Improve markdown text extraction when chunking (#76)

* Improve markdown text extraction when chunking

* Only use pandoc for text block conversion of non markdown files

* Fix typo

Version 2.2.0

06 Dec 10:59
b7c239f
Compare
Choose a tag to compare
Merge pull request #73 from dataiku/feature/extract-text-chunks

Extract text chunks

Version 2.1.1

18 Sep 08:37
18aad70
Compare
Choose a tag to compare
Merge pull request #72 from dataiku/chore/rename-recipes

Improve recipes title and description wording

Version 2.1.0

25 Jul 16:17
606ea31
Compare
Choose a tag to compare
Merge pull request #71 from dataiku/feature/text-extraction-pandoc

Text extraction with pandoc

Version 2.0.0

29 Jun 12:22
9df4c1e
Compare
Choose a tag to compare
Merge pull request #70 from dataiku/feature/add-easyocr

add easyocr and accept pdf

Release v1.0.3

21 Apr 14:41
f046670
Compare
Choose a tag to compare

Update code env description to support python versions 3.8, 3.9, 3.10 and 3.11

release v1.0.2

29 Nov 13:27
e6064dd
Compare
Choose a tag to compare
Update CHANGELOG.md