Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure algorithm runtime, don't schedule data object nodes as algortihms #21

Merged
merged 8 commits into from
Jul 10, 2024

Conversation

joott
Copy link
Collaborator

@joott joott commented Jul 9, 2024

BEGINRELEASENOTES

  • Algorithms now handled as a callable struct containing graph information
  • Data objects no longer scheduled as tasks

Struct allows algorithm to access information about its vertex, including average runtime and size.
Ideas taken from @m-fila in #19.

ENDRELEASENOTES

joott added 5 commits July 6, 2024 08:46
based on Mateusz's ideas without the datadeps updates
* added size_kb field to store size as a double
* adjusted scheduling.jl treatment of size to align with test graphs
Changed size_kb to size_average_B
@m-fila m-fila self-requested a review July 9, 2024 14:48
@m-fila
Copy link
Collaborator

m-fila commented Jul 9, 2024

A future of an algorithm execution is stored in the successor DataObject vertices. I like this conceptually but would it work also for graphs in which all the paths are terminated by a algorithm (consumer algorithm that only takes input but doesn't produce any output)?
I'd expect it can be a problem since we'd "lose" the future and don't have anything to wait for (or attach a notify graph finalization task to)

@m-fila
Copy link
Collaborator

m-fila commented Jul 10, 2024

I tried with data/sequential/df.graphml (terminating with consumer algorithm instead of data object) from master and got an error

How about the schedule_graph return a vector of tasks for all the terminating algorithms?
By terminating I mean algorithms that don't have a successor algorithm -> so either have no direct successors at all (no data object produced) or their direct successor have no successors (data objects produces but not used anywhere else).
This vector can be later given to notify_graph_finalization if scheduling multiple graphs or just awaited directly if scheduling a single graph

schedule_graph could check each algorithm vertex with something like

function is_terminating_alg(graph::AbstractGraph, vertex_id::Int)
    successor_dataobjects = outneighbors(graph, vertex_id)
    is_terminating(vertex) = isempty(outneighbors(graph, vertex))
    all(is_terminating, successor_dataobjects)
end

and then append Dagger.@spawned task if needed

What do you think?

@joott
Copy link
Collaborator Author

joott commented Jul 10, 2024

We could store futures in the algorithm as well as its children, which should resolve it. I do like the idea of schedule_graph returning a list of all the terminating algorithms though - that way we don't have to iterate through the whole graph again afterwards. Maybe we do both?

@m-fila
Copy link
Collaborator

m-fila commented Jul 10, 2024

We could store futures in the algorithm as well as its children, which should resolve it. I do like the idea of schedule_graph returning a list of all the terminating algorithms though - that way we don't have to iterate through the whole graph again afterwards. Maybe we do both?

Both is good

@m-fila m-fila changed the title Wrap MockupAlgorithm in a callable struct Configure algorithm runtime, don't schedule data object nodes as algortihms Jul 10, 2024
@m-fila m-fila merged commit 41a471b into key4hep:main Jul 10, 2024
2 checks passed
@joott joott deleted the algorithm branch July 12, 2024 12:44
@m-fila m-fila mentioned this pull request Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants