-
Notifications
You must be signed in to change notification settings - Fork 54
perf[next-dace]: RemoveScalarCopies and FuseHorizontalConditionBlocks transformations #2469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: set_gpu_maxnreg
Are you sure you want to change the base?
Conversation
philip-paul-mueller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a lot but most things are actually small.
src/gt4py/next/program_processors/runners/dace/transformations/remove_aliasing_scalars.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_aliasing_scalars.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_scalar_copies.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_scalar_copies.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_aliasing_scalars.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
…_name to gtx_transformations utils
Co-authored-by: Philip Müller <philip.mueller@cscs.ch>
2e18f65 to
3319c00
Compare
…arCopies Co-authored-by: Philip Müller <philip.mueller@cscs.ch>
philip-paul-mueller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things that need some modifications.
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_scalar_copies.py
Outdated
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_scalar_copies.py
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/remove_scalar_copies.py
Show resolved
Hide resolved
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Show resolved
Hide resolved
| first_conditional_block_state_names = [ | ||
| state.name for state in first_conditional_block.all_states() | ||
| ] | ||
| second_conditional_block_state_names = [ | ||
| state.name for state in second_conditional_block.all_states() | ||
| ] | ||
| if not ( | ||
| any("true_branch" in name for name in first_conditional_block_state_names) | ||
| and any("false_branch" in name for name in first_conditional_block_state_names) | ||
| and any("true_branch" in name for name in second_conditional_block_state_names) | ||
| and any("false_branch" in name for name in second_conditional_block_state_names) | ||
| ): | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is kind of okay, but I would look at the conditions this would make it possible that there are multiple states inside the region.
However, I fine with a TODO in that regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think atm there's any possibility to get multiple states that are not true/false_branch so this check is mostly for sanity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then a TODO should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand, would you like to remove this check or would you like to check if there can be more that true/false branches inside?
The latter is checked above:
if not (
isinstance(first_conditional_block, dace.sdfg.state.ConditionalBlock)
and len(first_conditional_block.sub_regions()) == 2
and isinstance(second_conditional_block, dace.sdfg.state.ConditionalBlock)
and len(second_conditional_block.sub_regions()) == 2
):
| ): | ||
| return False | ||
|
|
||
| # TODO(iomaganaris): Currently we do not handle NestedSDFGs inside the conditional blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure, but very positive that you can remove this restriction, because of the check about the compatibility of the symbol mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to add a test though then and I'm not sure it's worth the time 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine without a test, because a Tasklet is, on the DF level a self contained object, as such it it not different from a Tasklet.
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
| second_to_first_connections: dict[str, str] = {} | ||
| for node in nodes_renamed_map: | ||
| if isinstance(node, dace_nodes.AccessNode): | ||
| second_to_first_connections[node.data] = nodes_renamed_map[node].data | ||
| for first_inner_state in first_conditional_block.all_states(): | ||
| corresponding_state_in_second = corresponding_states_first_to_second[first_inner_state] | ||
| for edge in second_state_edges_to_add[corresponding_state_in_second]: | ||
| new_memlet = copy.deepcopy(edge.data) | ||
| if edge.data.data in second_to_first_connections: | ||
| new_memlet.data = second_to_first_connections[edge.data.data] | ||
| first_inner_state.add_edge( | ||
| nodes_renamed_map[edge.src], | ||
| nodes_renamed_map[edge.src].data | ||
| if isinstance(edge.src, dace_nodes.AccessNode) and edge.src_conn | ||
| else edge.src_conn, | ||
| nodes_renamed_map[edge.dst], | ||
| second_to_first_connections[edge.dst.data] | ||
| if isinstance(edge.dst, dace_nodes.AccessNode) and edge.dst_conn | ||
| else edge.dst_conn, | ||
| new_memlet, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| second_to_first_connections: dict[str, str] = {} | |
| for node in nodes_renamed_map: | |
| if isinstance(node, dace_nodes.AccessNode): | |
| second_to_first_connections[node.data] = nodes_renamed_map[node].data | |
| for first_inner_state in first_conditional_block.all_states(): | |
| corresponding_state_in_second = corresponding_states_first_to_second[first_inner_state] | |
| for edge in second_state_edges_to_add[corresponding_state_in_second]: | |
| new_memlet = copy.deepcopy(edge.data) | |
| if edge.data.data in second_to_first_connections: | |
| new_memlet.data = second_to_first_connections[edge.data.data] | |
| first_inner_state.add_edge( | |
| nodes_renamed_map[edge.src], | |
| nodes_renamed_map[edge.src].data | |
| if isinstance(edge.src, dace_nodes.AccessNode) and edge.src_conn | |
| else edge.src_conn, | |
| nodes_renamed_map[edge.dst], | |
| second_to_first_connections[edge.dst.data] | |
| if isinstance(edge.dst, dace_nodes.AccessNode) and edge.dst_conn | |
| else edge.dst_conn, | |
| new_memlet, | |
| ) | |
| for edge_to_copy in edges_to_copy: | |
| new_edge = corresponding_state_in_second.add_edge( | |
| nodes_renamed_map[edge_to_copy.src], | |
| edge_to_copy.src_conn, | |
| nodes_renamed_map[edge_to_copy.dst], | |
| edge_to_copy.dst_conn, | |
| edge_to_copy.data, | |
| ) | |
| if (not new_edge.data.is_empty()) and new_edge.data.data in second_arrays_rename_map: | |
| new_edge.data.data = second_arrays_rename_map[new_edge.data.data] |
However, this does not handle symbol renaming in the subsets.
See also comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subsets can only have symbols or something else as well?
| second_arrays_rename_map[data_name] = new_data_name | ||
| data_desc_renamed = copy.deepcopy(data_desc) | ||
| first_cb.sdfg.add_datadesc(new_data_name, data_desc_renamed) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something missing here.
While in can_be_applied() you check if there are no symbol conflicts you never copy the symbols that are exclusive to the second conditional block.
Put it differently you detect the case where SDFG 1 has n := a + 1 and SDFG 2 has n := a - 1, but you do not handle the case where SDFG 1 has m := a + 1 and SDFG 2 has o := a + 1.
So you must copy them as well, for this you have to do something like:
missing_symbols = {sym2: val2 for sym2, val2 in second_cb.symbol_mapping.items() if sym2 not in first_cb.symbol_mapping}
for missing_symb, symb_def in missing_symbols.items():
first_cb.symbol_mapping[missing_symb] = symb_def
first_cb.add_symbol(missing_symb, second_cb.sdfg.symbols[missing_symb], find_new_name=False)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to figure out how to add this to a test 👍
...t4py/next/program_processors/runners/dace/transformations/fuse_horizontal_conditionblocks.py
Outdated
Show resolved
Hide resolved
| for node in nodes_renamed_map: | ||
| if isinstance(node, dace_nodes.AccessNode): | ||
| second_to_first_connections[node.data] = nodes_renamed_map[node].data | ||
| for first_inner_state in first_conditional_block.all_states(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for node in nodes_renamed_map: | |
| if isinstance(node, dace_nodes.AccessNode): | |
| second_to_first_connections[node.data] = nodes_renamed_map[node].data | |
| for first_inner_state in first_conditional_block.all_states(): |
I am pretty sure this information is already contained in second_arrays_rename_map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to keep the reference to the node object though so I can find the corresponding new node in
new_edge = first_inner_state.add_edge(
nodes_renamed_map[edge_to_copy.src],
edge_to_copy.src_conn,
nodes_renamed_map[edge_to_copy.dst],
edge_to_copy.dst_conn,
edge_to_copy.data,
)
Co-authored-by: Philip Müller <philip.mueller@cscs.ch>
Added transformations related to Graupel granule optimization
Main point of this PR is the
FuseHorizontalConditionBlockstransformation that groups togetherConditionalBlocksthat share the same condition (__cond)AccessNodes so that we can find whichConditionalBlockscorrespond to the same condition whose value is saved to anAccessNodeTODOs
greenline