Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run OCR on image after saving (synchronously!) (#20) #26

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

danieloeh
Copy link
Contributor

This is a first implementation for running OCR whenever an image is newly created or changed. One major downside of this implementation is that saving an Item where the image has been changed now takes much longer because the OCR is running on django's main thread.
A better approach would probably be to use something like Celery to run OCR tasks in the background. The latter approach would also allow us to run periodic tasks. However this would add more dependencies and background services to the project.
Just let me know whether you would prefer the solution submitted in this PR or the one i have just described :)

@cod3monk
Copy link
Contributor

Have you benchmarked how long it takes to upload and process 3 photos? If this is <1s, I wouldn't mind. Uploads from smartphone already take longer on most phones (with high resolution images).
I'm not opposed to having celery in the stack, as long as the development version can still run standalone (with synchronous or no OCR). I guess it would be more sound implementation.

@danieloeh
Copy link
Contributor Author

I just did a quick test and it took a little less than 6 seconds to upload 3 smartphone photos. So, in the long run, i think that moving such tasks in the background would make sense.

@cod3monk
Copy link
Contributor

I agree, background seems like the way to go.

@danieloeh
Copy link
Contributor Author

I have changed the code so that celery is used if available. Celery requires a redis instance and a separate worker process. If the config variable CELERY_TASK_ALWAYS_EAGER is set to True in local_settings.py, then tasks will be executed in the foreground. If you want to run celery in the dev environment, use the command ./manage.py celery_worker run to start the worker and e.g. docker to run a redis instance.
I have added celery and redis to the prod environment as well, but wasn't able to test this fully because i had some issues with the web interface there. However, i have checked that the redis container and the celery worker are running in the docker compose cluster.

@cod3monk
Copy link
Contributor

Cool! Will try this out (hopefully) soon ;)

BTW: on Monday there is an "Open Source Developers Meetup" at ZAM, where @unaimed, who is interested in this project, will attend. See here for details.

@danieloeh
Copy link
Contributor Author

Thank you for the info! I am planning to be there as well this time, hopefully around 18:30.

@cod3monk
Copy link
Contributor

Unfortunately, I won't be able to attend. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants