Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a first EXIF only Post processor Plugin #6

Open
DiegoPino opened this issue Mar 17, 2020 · 7 comments
Open

Make a first EXIF only Post processor Plugin #6

DiegoPino opened this issue Mar 17, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@DiegoPino
Copy link
Member

What is this?

This is all related to #4 and a call we had with @giancarlobi today (MARCH 17th 2020).

The tasks:

Resuming: This task by itself is just making a particular, slim, and limited version of the generic Binary Processor. Nothing more

How does this fit in the global Chain?

I will explain how this needs to be done at the end, when we have all pieces 🍓

  • Any CRUD action on a ADO will also trigger an JSON Event (as we do with the pre-save ones)
  • This module contains one specific event Type extending JSON Event, that when triggered that has/should have the logic to:
    • Fetch all Post processor Plugins (EXIF, HOCR, etc) that match the input (some metadata or files). EXIF will be one of those
    • Generate all local/temp files needed for the Plugin instances
    • Push into a Queue each atomic task. Queue will contain the value (JSON?) or just the ADO UUID, the Plugin ID that needs to trigger and (maybe?) the needed file UUIDs
    • The event will also, trigger, once all is pushed, the mainLoop that will open the background processing of this or, if already there, end its task
@DiegoPino DiegoPino added the enhancement New feature or request label Mar 17, 2020
@DiegoPino DiegoPino added this to the 1.0.0-beta3 milestone Mar 17, 2020
@DiegoPino
Copy link
Member Author

@giancarlobi i will start with this one today (sorry for the delay, got distracted with IIIF and Ajax). Will ask for your guidelines and help once i'm in a more advanced state, but i feel it should go faster than expected. Probably the most complex part will be the definition/creation of the Queues and the admin Forms around that. Hope you are doing well. hugs!

@DiegoPino DiegoPino self-assigned this Apr 7, 2020
@giancarlobi
Copy link
Contributor

@DiegoPino Great! No problem for delay, I also involved in other issues so I'm good now for support your great fantastic work! All "well" here, thanks, I hope same for you.

@DiegoPino
Copy link
Member Author

@giancarlobi hi!. I have a question. I'm trying to figure out what is the best moment to trigger the event that will be catched by the eventsubscriber that will enqueue all the actions. I feel it should be at the end of everything, after the Node has been persisted and saved. But then i have second thoughts since i probably want to add info to the SBF JSON about what runners/flavours where send. Remember the tracking info? So, maybe the best is to trigger the runner processing event as the last event of the PRESAVE type of event? What do you think?

@giancarlobi
Copy link
Contributor

@DiegoPino sorry for late answer, some issues about upgrade OS of mail server... Well, I think yes, it should be at the end of everything. Flavours are addition to SBF Digital object, right? So they could be run also (extremely) in a second step.
In addition I think runners processing depends on ingesting type: webform or manual/script/code driven. For webform I agree with you that we want flavours processing start at presave/save while for code driven ingesting probably flavours processing could be started independently after ingesting.
Not sure I'm clear, probably because my brain is not so fresh now ...

@DiegoPino
Copy link
Member Author

@giancarlobi Ok. I went for presave, but at the last, last step with priority -2000.

I had to make a fix on the main strawberryfield module, but that works now. See https://github.com/esmero/strawberryfield/pull/81/files. (The priority was not being respected, now it does and i learned some more PHP!)

Reason to go for the presave event is (this applues only for pushing items into the queue, not to running the queue..) we want sometimes (maybe?) to annotate the SBF JSON before saving the Node with info needed for the flavors to run/flag if they they still need to be processed or not. Seems to be running OK, i mean, does nothing yet, but it triggers. see #5

I still don't know why my module_hook_update() was not triggering in this module, but can check that later one, i left a note in a commit in pull #5 for you on how to force it to run in the meanwhile.

Now i have a bigger question for you! (sorry).

So, in the queue i want to push which Post processor plugin should run. That is similar to what you already have here https://github.com/esmero/strawberry_runners/blob/master/test/submit.php#L40-L42, instead flavour id we pass the configuration entity id that defines how a given plugin will run, we BUT! what happens if someone removes the plugin? Or changes the settings before that queue item runs?

So Questions is:

  • Should we pass the plugin config/full configuration too? Why?
    If we pass the full plugin config we could have always something that runs stable. But! If, e.g the config, lets say for EXIF, is wrong, no matter what we do, and if the item fails, it will keep failing forever. So in that case maybe not pass the config?
  • We don't pass the plugin config. Means the Queue item is really not self sustainable. If the plugin id changes (we can have a last modified timestamp) or is removed, the Queue worker itself can decide if it should try to run or not? E.g, if the EXIF plugin is removed and the item says EXIF it could simply send a watchdog error and say, 'Hey, i don't know how to run', or, it could still try to run?

I do prefer the second option, in terms of, lets say we need to correct an item that failed to run. Only way is to either remove it and read it to the queue (also good), or change the settings.
But on the other side, if the plugin config and the plugin itself are also added to the Queue Item, then in theory one of this could run even if everything fails.. not sure if we need that, just saying.

I know you are busy so i will go for: read config in realtime, if config changes execution will also. But we can discuss this later on.

Hugs!

@giancarlobi
Copy link
Contributor

So Questions is:

* Should we pass the plugin config/full configuration too? Why?
  If we pass the full plugin config we could have always something that runs stable. But! If, e.g the config, lets say for EXIF, is wrong, no matter what we do, and if the item fails, it will keep failing forever. So in that case maybe not pass the config?

* We don't pass the plugin config. Means the Queue item is really not self sustainable. If the plugin id changes (we can have a last modified timestamp) or is removed, the Queue worker itself can decide if it should try to run or not? E.g, if the EXIF plugin is removed and the item says EXIF it could simply send a watchdog error and say, 'Hey, i don't know how to run', or, it could still try to run?

I do prefer the second option, in terms of, lets say we need to correct an item that failed to run. Only way is to either remove it and read it to the queue (also good), or change the settings.
But on the other side, if the plugin config and the plugin itself are also added to the Queue Item, then in theory one of this could run even if everything fails.. not sure if we need that, just saying.

@DiegoPino Sorry for my late answer to this. I think that depends on the strategy we adopt for runner fail/error. Maybe the right choice is a mix of first and second option, that is we pass the full config of plugin to make it reflect (as a snapshot) current config. Then if it fails it returns "Hey, I failed, please check plugin config then run again".
This choice allows user to make change to plugin config if queue is running, doesn't matter, the new config will be used the next time runner is queued.
What do you think?

@DiegoPino
Copy link
Member Author

@giancarlobi no worries, any time is perfect. I do agree with you. We can have both worlds and even allow this as an option (checkbox somewhere). Working hard on this, hope can share something with you this week. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants