Merge pull request #68 from dataiku/fix/pandas-error-python-37

Fix/pandas error python 37
dataiku · Nov 29, 2021 · 4a490d1 · 4a490d1
2 parents bdbe2cb + c07e6f2
commit 4a490d1
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 8 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,9 @@
+# Changelog
+
+## [Version 1.0.2](https://github.com/dataiku/dss-plugin-tesseract-ocr/releases/tag/v1.0.2) - Initial release - 2021-03
+
+- Fix an error in python 37 in the text extraction recipe
+
+## [Version 1.0.1](https://github.com/dataiku/dss-plugin-tesseract-ocr/releases/tag/v1.0.1) - Initial release - 2021-03
+
+- Custom code of the image processing recipe is successfully saved when exiting and coming back to the recipe
diff --git a/README.md b/README.md
@@ -7,12 +7,6 @@ The plugin has four components (three recipes and a notebook template):
 - Image Processing notebook: notebook to explore different types of image processing to improve (or not) text extraction from tesseract. Then, the functions that were tested in the notebook can be used in the Image Processing recipe.
 - Image Processing recipe: recipe to process images using functions defined by the user in the python editor area of the recipe parameter's form.
 
-## Release notes
-
-**Version 1.0.1 (2021-03)** - Fix release
-
-- Custom code of the image processing recipe is successfully saved when exiting and coming back to the recipe
-
 ## Instructions to use the notebook template
 
 Go to notebook (G+N) and create a new python notebook. Select the template `Image processing for text extraction` and then check that the plugin code env is selected (you can set it in the tab Kernel > Change kernel).

diff --git a/custom-recipes/ocr-text-extraction-dataset/recipe.py b/custom-recipes/ocr-text-extraction-dataset/recipe.py
@@ -14,7 +14,7 @@
 input_filenames = input_folder.list_paths_in_partition()
 total_images = len(input_filenames)
 
-df = pd.DataFrame(columns=['file', 'text'])
+df = pd.DataFrame()
 
 for i, sample_file in enumerate(input_filenames):
     if sample_file.split('.')[-1] != "jpg":

diff --git a/plugin.json b/plugin.json
@@ -1,6 +1,6 @@
 {
     "id": "tesseract-ocr",
-    "version": "1.0.1",
+    "version": "1.0.2",
     "meta": {
         "label": "Tesseract OCR",
         "description": "Extract text from images using the Tesseract Optical Character Recognition (OCR) engine",