Skip to content

Commit

Permalink
Merge pull request #68 from dataiku/fix/pandas-error-python-37
Browse files Browse the repository at this point in the history
Fix/pandas error python 37
  • Loading branch information
StanislasGuinel authored Nov 29, 2021
2 parents bdbe2cb + c07e6f2 commit 4a490d1
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 8 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Changelog

## [Version 1.0.2](https://github.com/dataiku/dss-plugin-tesseract-ocr/releases/tag/v1.0.2) - Initial release - 2021-03

- Fix an error in python 37 in the text extraction recipe

## [Version 1.0.1](https://github.com/dataiku/dss-plugin-tesseract-ocr/releases/tag/v1.0.1) - Initial release - 2021-03

- Custom code of the image processing recipe is successfully saved when exiting and coming back to the recipe
6 changes: 0 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,6 @@ The plugin has four components (three recipes and a notebook template):
- Image Processing notebook: notebook to explore different types of image processing to improve (or not) text extraction from tesseract. Then, the functions that were tested in the notebook can be used in the Image Processing recipe.
- Image Processing recipe: recipe to process images using functions defined by the user in the python editor area of the recipe parameter's form.

## Release notes

**Version 1.0.1 (2021-03)** - Fix release

- Custom code of the image processing recipe is successfully saved when exiting and coming back to the recipe

## Instructions to use the notebook template

Go to notebook (G+N) and create a new python notebook. Select the template `Image processing for text extraction` and then check that the plugin code env is selected (you can set it in the tab Kernel > Change kernel).
Expand Down
2 changes: 1 addition & 1 deletion custom-recipes/ocr-text-extraction-dataset/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
input_filenames = input_folder.list_paths_in_partition()
total_images = len(input_filenames)

df = pd.DataFrame(columns=['file', 'text'])
df = pd.DataFrame()

for i, sample_file in enumerate(input_filenames):
if sample_file.split('.')[-1] != "jpg":
Expand Down
2 changes: 1 addition & 1 deletion plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"id": "tesseract-ocr",
"version": "1.0.1",
"version": "1.0.2",
"meta": {
"label": "Tesseract OCR",
"description": "Extract text from images using the Tesseract Optical Character Recognition (OCR) engine",
Expand Down

0 comments on commit 4a490d1

Please sign in to comment.