Skip to content

DAS-2368: Configure & support dimension variables for SMAP L3 at the root level#36

Merged
joeyschultz merged 9 commits intomainfrom
DAS-2368
Nov 17, 2025
Merged

DAS-2368: Configure & support dimension variables for SMAP L3 at the root level#36
joeyschultz merged 9 commits intomainfrom
DAS-2368

Conversation

@joeyschultz
Copy link
Contributor

@joeyschultz joeyschultz commented Nov 11, 2025

Description

This PR adds the ability to create up-level (shared) dimensions.
The varinfo configuration was modified to create shared dimensions at the root level for:

  • SPL3FTA
  • SPL3SMA
  • SPL3SMP
  • SPL3SMP_E

Jira Issue ID

DAS-2368

Local Test Steps

Check out this branch and build the new image and run the tests

./bin/build-image && ./bin/build-test && ./bin/run-test

Ensure your HIAB configuration has:

LOCALLY_DEPLOYED_SERVICES=harmony-metadata-annotator,hoss,sds-maskfill

Restart harmony to pick up the new image:

./bin/bootstrap-harmony

Run the SPL3SMP_E bbox request in UAT to see the current behavior (dimensions created in each group):
https://harmony.uat.earthdata.nasa.gov/C1268399578-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&subset=lat(60%3A85)&subset=lon(-70%3A-10)&label=harmony-py&maxResults=1&variable=Soil_Moisture_Retrieval_Data_AM%2Fsurface_flag&variable=Soil_Moisture_Retrieval_Data_PM%2Fsurface_flag_pm&variable=Soil_Moisture_Retrieval_Data_Polar_AM%2Fsurface_flag&variable=Soil_Moisture_Retrieval_Data_Polar_PM%2Fsurface_flag_pm

Now run the same request on localhost to see the change (shared dimensions created up-level):
http://localhost:3000/C1268399578-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&subset=lat(60%3A85)&subset=lon(-70%3A-10)&label=harmony-py&maxResults=1&variable=Soil_Moisture_Retrieval_Data_AM%2Fsurface_flag&variable=Soil_Moisture_Retrieval_Data_PM%2Fsurface_flag_pm&variable=Soil_Moisture_Retrieval_Data_Polar_AM%2Fsurface_flag&variable=Soil_Moisture_Retrieval_Data_Polar_PM%2Fsurface_flag_pm

Accompanying regression test PRs:

  • das-harmony-regression-tests: PR#68
  • harmony-regression tests: TBD

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Fix version [harmony-metadata-annotator-X.Y.Z] added to Jira ticket if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

}

return var_dim_map
# Identify dimensions that have been created up-level and update map
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the docstring notes, this code for updating the var_dim_map is only temporary required until earthdata-varinfo supports up-level dimensions.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly questions for you to educate me. I have a couple of suggestions. But it's very close, I did run the test requests and looked at the outputs. Nice.

"Mission": "SMAP",
"ShortNamePath": "SPL3SMP$",
"VariablePattern": "/Soil_Moisture_Retrieval_Data_(AM|PM)/(?!lc_type$).+$"
"VariablePattern": "/Soil_Moisture_Retrieval_Data_(AM|PM)/.*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and no lc_types? (are these gone now from exclusions?) looks like it's hoisted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my explanation for why I removed the am_pm exclusion from the variable pattern in the above comment. The same applies here - just for lc_type in this case

for parent in node.parents:
dim_candidate_path = construct_dim_path(parent.path, dim)
if dim_candidate_path in variables_to_create:
parent.ds = parent.ds.assign(**{dim: dim_src})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to unpack that dict? couldn't you just assign it as a dict?

Suggested change
parent.ds = parent.ds.assign(**{dim: dim_src})
parent.ds = parent.ds.assign({dim: dim_src})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Updated in ae7d2bf.

for node in datatree.subtree
for dim in node.ds.dims
if (dim_full_path := construct_dim_path(node.path, dim)) in variables_to_create
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if this is clever or I'm just dense because I didn't know the walrus operator. I'm almost tempted to say it should just be nested loops for readability, but maybe I'm off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Matt and decided to use nested for loop to simplify things. I had tried making the set comprehension work but I don't think its worth the complexity.

Updated in ae7d2bf.

}

return var_dim_map
# Identify dimensions that have been created up-level and update map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the original and I don't understand what it does or how it's used.
It looks like it just get all of the dimension pairs, but then truncates all but one value from the list of variables.

L139-L142. I'm baffled why we would do that. (git blame says it was me, but the ticket link is to some epic)

Comment on lines +157 to +159
f'{parent.path}/{dimension_name}'
if parent.path != '/'
else f'/{dimension_name}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you have a helper for this somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Matt and landed on using an inner helper function for this case. I'm unable to import annotate::construct_dim_path due to circular imports and don't think its worth creating a new module to get around that because this invocation of construct_dim_path will go away once the varinfo update is made.

Updated in d17535f.

"Value": "y X"
}
],
"_Description": ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either add a description or delete this, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in da31df7.

In this commit, I also made some updates to the function that uses this test varinfo config entry (test_update_dimension_variables) to properly test that we are renaming the dimensions. I also moved the text fixture datatree and varinfo object directly into the test because this was the only test using them.

assert set(
datatree['/Freeze_Thaw_Retrieval_Data_Global/surface_flag'].dims
) == set(['am_pm', 'y', 'x'])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these at the top level now? could they be checked there? doesn't look like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion of the renaming was deleted because this PR removes the dimension renaming functionality from update_group_and_variable_attributes.

)
def test_has_dimension_override(sample_varinfo_test06, variable_path, expected):
"""Ensure variables correctly evaluated to contain non-null dimensions override."""
assert has_dimension_override(sample_varinfo_test06, variable_path) == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of indirection to get to what this test does. I don't have a suggestion, mostly complaints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the test easier to understand, I moved the test datatree and varinfo object directly into the test function. These test06 fixtures were not used by any other tests so I think this change to creating them directly in the test function is better.

Updated in da31df7

with xr.open_datatree(sample_netcdf4_file_test07) as test_datatree:
assert (
get_new_dimension_variables(test_datatree, variables_to_create) == expected
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this test.

It might help if there was a test that asked for more variables to create than exist on the file. because that passes too.

I'm not even sure if that' useful information or if that would ever happen
e.g.

        (
            {'/y_root',  '/sub_group/y', '/missing/dimension/z'},
            {'/y_root',  '/sub_group/y'},
        ),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the suggested test case is more useful than what I previously had.

Updated in da31df7

{
"Name": "dimensions",
"Value": "y X"
"Value": "y x"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all those changes. This looks good. Hopefully we can yank some of it out soon.

@joeyschultz joeyschultz merged commit 1915616 into main Nov 17, 2025
4 checks passed
@joeyschultz joeyschultz deleted the DAS-2368 branch November 17, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants