-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify FM over all grids #6981
base: master
Are you sure you want to change the base?
Unify FM over all grids #6981
Conversation
Also - remove devivation type - simplify group logic to ensure subfields over grids have same index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments highlight the important pieces since this is such a large PR. So many of the changes come from every instance of the FM has to be used differently (passing grid names instead of only one grid being stored).
// we register a corresponding version of the field on the "target" FM. | ||
|
||
// Helper lambda to reduce code duplication | ||
auto process_imported_groups = [&](const std::set<GroupRequest>& group_requests) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this logic to FieldManager::pre_process_groups()
.
Field::clone(const std::string& name) const { | ||
return clone(name, get_header().get_identifier().get_grid_name()); | ||
} | ||
|
||
Field | ||
Field::clone(const std::string& name) const { | ||
Field::clone(const std::string& name, const std::string& grid_name) const { | ||
// Create new field | ||
const auto& my_fid = get_header().get_identifier(); | ||
FieldIdentifier fid(name,my_fid.get_layout(),my_fid.get_units(), | ||
my_fid.get_grid_name(),my_fid.data_type()); | ||
grid_name,my_fid.data_type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to add a grid argument to Field::clone()
because, in many cases, we are cloning a grid (giving it a new name), constructing a FM(new_grid)
, but using a field whose ID gives a grid name from the old grid, and the new FM was complaining that the grid was not in the internal grids manager.
@@ -185,7 +190,7 @@ Field Field::subfield(const std::string& sf_name, | |||
"Error! Input field must be allocated in order to subview it.\n"); | |||
|
|||
auto sf_layout = lt.clone(); | |||
sf_layout.reset_dim(idim, index_end - index_beg); | |||
sf_layout.reset_dim(idim, index_end+1 - index_beg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kokkos::subview(..., beg, end, ...)
has inclusive bounds, but we were treating the subview bound as [bed, end)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, subview has inclusive bounds? That seems odd, given that the end
keyword is virtually universally referring to "the first after the last". Also, I think the way we use this method by passing index_end="the first after the last". So when computing the layout dimension, we should not add 1...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kokkos doc says exclusive
which makes the most sense. Somewhere we are going wrong in EAMxx/EKAT, because doing exclusive bounds threw an error in ekat subview about dim being exceeded. I'll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe EKAT is falsely asserting that the end subview index is less than the dimension, when they can be equal (since end can be up to 1 past the last possible index) https://github.com/E3SM-Project/EKAT/blob/0eb00e5d017598aa29b27adc8fb13f51d15c4c1c/src/ekat/kokkos/ekat_subview_utils.hpp#L367C13-L367C44
@@ -88,7 +88,7 @@ FieldAllocProp FieldAllocProp::subview(const int idim, | |||
EKAT_REQUIRE_MSG(index_beg < index_end, | |||
"Error! Slice indices are invalid (non-increasing).\n"); | |||
EKAT_REQUIRE_MSG( | |||
index_beg >= 0 && index_end < m_layout.dim(idim), | |||
index_beg >= 0 && index_end <= m_layout.dim(idim), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment above about Kokkos bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In EAMxx we use "std-like" convention of "end=past the last element".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct since index_end
can go up to one past the last possible index, which is equal to the dim.
info.m_subview_idx [*it] = std::distance(cluster_ordered_fields.begin(),it); | ||
} | ||
info.m_bundled = true; | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now indices info is independent of grid, guaranteeing each grid agrees.
return it==m_fields.end() ? nullptr : it->second; | ||
FieldManager::get_field_ptr (const std::string& name, const std::string& grid_name) const { | ||
auto it = m_fields.at(grid_name).find(name); | ||
return it==m_fields.at(grid_name).end() ? nullptr : it->second; | ||
} | ||
|
||
void FieldManager::pre_process_group_requests () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important function for gathering which grids should register which group fields
// the parent field. So at registration time, simply keep track of the subfields, | ||
// and create them at registration_ends() time, after all other fields. | ||
std::map<std::string,FieldRequest> m_subfield_requests; | ||
|
||
// The map group_name -> FieldGroupInfo | ||
group_info_map m_field_groups; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't change, group info is the same for all grids.
@@ -115,6 +115,53 @@ AbstractGrid::get_3d_tensor_layout (const bool midpoints, const std::vector<int> | |||
return get_3d_tensor_layout(midpoints,cmp_dims,names); | |||
} | |||
|
|||
FieldLayout | |||
AbstractGrid::equivalent_layout (const FieldLayout& template_layout) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how we get the layout for tracer on a new grid.
std::set<std::string> GridsManager:: | ||
get_grid_names () const { | ||
std::set<std::string> names; | ||
if (m_grids.size()==0) { | ||
return names; | ||
} | ||
for (const auto& g : m_grids) { | ||
names.emplace(g.second->name()); | ||
} | ||
return names; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using this function a lot, but I don't know that I love it. It doesn't take into account aliases. I could instead always loop over the grid manager's grid map and query the names from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is reasonable. Whether to include aliases depends on how you later intend to use this method. But I think the GM's get_grid
method also scans if the input name is a valid grid alias, so maybe it would make sense to return aliases as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this as a way to look through the FM grids, so I think adding aliases might result in repeated work.
Quite a few fails in the CI, odd some didn't trigger on my workstation (rrtmgp standalone for instance)... Leaving Draft for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some preliminary thoughts.
components/eamxx/src/physics/mam/readfiles/tracer_reader_utils.hpp
Outdated
Show resolved
Hide resolved
// Get information about the state of the repo | ||
int size () const { return m_fields.size(); } | ||
RepoState repository_state () const { return m_repo_state; } | ||
// Get number of registered fields on particular grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even use this method? If not, we can prune it...
compute_fields(fm1,t0,comm,0); | ||
compute_fields(fm2,t0,comm,nlevs_filled); | ||
compute_fields(fm3,t0,comm,0); | ||
compute_fields(fm1,gn,t0,comm,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on FM header: if we allow querying for a field without passing the grid name (if there's only 1 grid), we can revert these changes (here, as in many other unit tests)...
@@ -185,7 +190,7 @@ Field Field::subfield(const std::string& sf_name, | |||
"Error! Input field must be allocated in order to subview it.\n"); | |||
|
|||
auto sf_layout = lt.clone(); | |||
sf_layout.reset_dim(idim, index_end - index_beg); | |||
sf_layout.reset_dim(idim, index_end+1 - index_beg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, subview has inclusive bounds? That seems odd, given that the end
keyword is virtually universally referring to "the first after the last". Also, I think the way we use this method by passing index_end="the first after the last". So when computing the layout dimension, we should not add 1...
} | ||
} | ||
} | ||
::scream::sort(groups_to_bundle); | ||
::scream::sort(copied_groups); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO:
The in-house scream::sort
method was just to get around a gcc8+cuda10 issue (as explained in scream_utils.hpp). It is very well possible we no longer need this. We should investigate and possibly remove this fcn if we no longer have the issue (both gcc and cuda are much more recent than those versions).
|
||
// Constructor(s) | ||
explicit FieldManager (const grid_ptr_type& grid); | ||
explicit FieldManager (const std::shared_ptr<const AbstractGrid>& grid); | ||
explicit FieldManager (const std::shared_ptr<const GridsManager>& grid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: either always use grids_mgr_type, or remove the typedef and just use the spelled out name (like you did for grid). I favor the latter, btw.
// When registering subfields, we might end up registering the subfield before | ||
// the parent field. So at registration time, simply keep track of the subfields, | ||
// and create them at registration_ends() time, after all other fields. | ||
std::map<std::string,FieldRequest> m_subfield_requests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we were not correctly handling subfield requests. In particular, we were never setting up subfields at the end of registration_ends.
Since nobody has complained, I gather the incomplete "feature" was never used. Hence, I would also purge the reference to this possibility in FieldRequest (the subview-related members and the subview-related constructor).
std::set<std::string> GridsManager:: | ||
get_grid_names () const { | ||
std::set<std::string> names; | ||
if (m_grids.size()==0) { | ||
return names; | ||
} | ||
for (const auto& g : m_grids) { | ||
names.emplace(g.second->name()); | ||
} | ||
return names; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is reasonable. Whether to include aliases depends on how you later intend to use this method. But I think the GM's get_grid
method also scans if the input name is a valid grid alias, so maybe it would make sense to return aliases as well...
model_initial_np4
EKAT has an incorrect assert on the extents of subview slices, namely that the end index must be less than the dim extent, when it must be less than or equal to the last extent (since Kokkos uses exclusive end bounds). Will require an EKAT PR and update.
952e6bf
to
cd557bc
Compare
We need the changes to assert on subview bounds
c859596
to
1e69db5
Compare
Store a single
FieldManager
for all fields over all grids.This solves the issue in #6789 where having a group as a subset of another, and allocated on different grids, was causing subview indices mismatch between grids.
Also