Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress monitoring #23

Open
brucehoff opened this issue Feb 13, 2018 · 13 comments
Open

Progress monitoring #23

brucehoff opened this issue Feb 13, 2018 · 13 comments

Comments

@brucehoff
Copy link

brucehoff commented Feb 13, 2018

There are use cases for retrieving intermediate information from a workflow, e.g.:

  • What step (of a multistep workflow) is in progress and what is its percent complete? (A user may wish to cancel a workflow if it's too slow.)
  • How far or how well has my machine learning model converged? (Again, a poorly progressing model might be canceled.)

Under the existing API, such in-progress information would come from retrieving and reading or parsing the workflow's log files. If the information could be returned in a more structured form (e.g. in a set of key-value pairs) then, when used with #21 a client could create a dashboard of running workflows and/or answer questions like "which of my jobs is closest to complete"?

@psafont
Copy link

psafont commented Feb 14, 2018

Since you're looking to build a dashboard, would it be useful not only to retrieve the status of the tasks, but the Graph representation of the workflows?

Like this: https://view.commonwl.org/workflows/github.com/ICGC-TCGA-PanCancer/OxoG-Dockstore-Tools/blob/develop/oxog_varbam_annotate_wf.cwl

@brucehoff
Copy link
Author

@psafont that's interesting but it's not immediately obvious how one would show the status of say, 50 workflows in a single view by using a graph representation. I was thinking in terms of a tabular view with one row per workflow/job, e.g.,
https://www.synapse.org/#!Synapse:syn4224222/wiki/434546
I'd be interested to hear about other ideas for showing the status of many jobs.

@geoffjentry
Copy link
Contributor

We've played around w/ the idea in Cromwell but one thing I have always found is that getting people to agree on what that graph should look like is so difficult that it's easier just to give something lo-fi like @brucehoff suggests.

Keep in mind that this isn't supposed to be some uber API, but rather something that sits beneath whatever layer the user is ultimately interacting with. That layer can easily generate a visual graph based on information provided in WES

@psafont
Copy link

psafont commented Feb 14, 2018

one thing I have always found is that getting people to agree on what that graph should look like is so difficult that it's easier just to give something lo-fi like @brucehoff suggests.

That was my suspicion too, good to see this kind of granularity can be thrown out early on.

Useful information for users about tasks can be elapsed time, status of the task and even maybe why it's pending (waiting on resources / an input). Some of the information could be coarser, e.g. showing statistics about pending tasks instead of showing all the tasks individually, but I don't know that much about the use cases.

@david4096
Copy link
Member

david4096 commented Feb 14, 2018

Right now, the stdout is snapshotted and can be retrieved by GetWorkflowLog. With some luck a progress message might be reported by the workflow itself and observable from this endpoint. We don't expect clients to write a log parser to find this message in status reports for each workflow, or for workflow authors to add specific progress report log messages.

Any suggestions on how best to close this? Should WES offer a specific endpoint for reporting status using schematized messages? Should we specify a very clearly formatted log message that WES services will know to parse into progress?

@brucehoff
Copy link
Author

Any suggestions on how best to close this?

I would suggest (1) extending the ga4gh_wes_workflow_status object (returned by /workflows and /workflows/{id}/status) to include a list of key-value pairs and (2) adding an endpoint that a workflow can call to push status information to WES.

@ruchim
Copy link
Collaborator

ruchim commented Nov 19, 2020

@brucehoff -- I might be mis-understanding here -- but is it helpful to just get a guarantee of key/value pairs?

Any WES implementation with active usage certainly has this need, no doubt about it at all. I just think this is the type of thing that's hard to standardize -- and I'm not sure guaranteeing some structured information enforces what you'd like. Instead -- this is where implementors should have flexibility and just listen to their users and come up with creative solutions for showing progress of a workflow -- thoughts?

@patmagee
Copy link
Contributor

@brucehoff curious to know whether this is still an issue given the current implementation of WES

@brucehoff
Copy link
Author

@patmagee Are you saying that there is an update to WES that obviates the need for this requested feature? If so, please let me know what that is and I will be happy to comment on whether it meets the perceived need.

@patmagee
Copy link
Contributor

@brucehoff You can get the current running task from the list of tasks in the /ga4gh/wes/v1/runs/{runId}, so to that extent I think it addresses the first part of your request, and maybe that is enough.

From WES's perspective, I do not think it's reasonable to know the percentage of a tasks completion, since there is really no way to know this in a generalized way. Many bioinformatics tools can run for minutes, hours or longer and they do not report progress so I am not sure how WES could know what percentage done the individual tasks were (unless you meant the overall progress through the workflow... which is still a hard problem).

In the same vein, I am not sure how WES would be able to report things like model convergence since that is a very specific problem and requires wes to understand WHAT it is running. At the moment WES does not need to understand the context of a workflow, it simply is the distribution and reporting API. It would be great to work more machine learning concepts into WES, but maybe as a separate extension or even a custom implementation (there is nothing preventing an implementor from adding that)

So if the tasks are sufficient, then I think we can close this.

@jaeddy
Copy link
Member

jaeddy commented Sep 23, 2022

I was reminded today of at least policy, if not more technical standards around "return of results" with patient data. I feel like the issue here almost merits a separate Cloud WS standard for "return of status" — at least a common data model, if not an API. WES and TES, maybe even TRS, could interpret and parse data communicated in this model to display for monitoring or other purposes. I agree that, similar to workflow languages themselves, specifying a common model for run status is beyond the scope of WES. However, we could include the idea in a wish list / backlog for future Cloud standards.

@brucehoff
Copy link
Author

@patmagee

I do not think it's reasonable to know the percentage of a tasks completion, since there is really no way to know this in a generalized way.

Sure there is: The running workflow can communicate this information to the WES-compliant workflow execution engine which can, in turn, return this to the client which initiated the workflow.

I am not sure how WES would be able to report things like model convergence

Again, the running workflow can communicate this information to the WES-compliant workflow execution engine which can, in turn, return this to the client which initiated the workflow.

The overarching idea is to expand the scope of the standard from an API that only the client interacts with to an API that the running workflow can also interact with.

@uniqueg
Copy link
Contributor

uniqueg commented Sep 27, 2022

I agree that it would be awesome to have WES interact with workflow engines to report these stats (and possibly others in the future). It goes a bit in the direction of what @denis-yuen proposed in ga4gh/tool-registry-service-schemas#223 (and also see ga4gh/tool-registry-service-schemas#224 and ga4gh/tool-registry-service-schemas#225) for TRS (and, btw, runtime stats reported back from users to TRS might be another way of addressing this issue to some extent).

However, I'm not quite sure how to start with this. Coming up with a model in WES for workflow (or worse, TES for tool) developers in a sort of vacuum doesn't seem to me to be very promising. But following @jaeddy: If we have at least one strong use case to drive this and can bring at least two or so workflow types to the table who commit to work it out together and develop such a data model together with the WES team, I think that would be really nice.

Would you be willing to drive this @brucehoff?

Also tagging @mr-c, @pditommaso, @johanneskoester, @bgruening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants