-
Notifications
You must be signed in to change notification settings - Fork 2
allow outputing the grid when creating regional maps #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds functionality to output SCRIP format grid files containing only the regional subset when creating regional remapping files. This is useful for workflows that need the regional grid definition in addition to the remapping weights.
Changes:
- Extended Grid and Site classes to load and store SCRIP-specific data (grid areas, corner coordinates, masks)
- Added
construct_regional_grid()function to generate SCRIP format grid files from filtered regional data - Added
--output-regional-gridcommand-line flag to enable regional grid output - Replaced deprecated
np.in1dwithnp.isinfor better NumPy compatibility
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| isite.lat_corner_lats = np.vstack([isite.lat_corner_lats, src_grid.corner_lat[lids[0],:]]) | ||
| isite.lat_corner_lons = np.vstack([isite.lat_corner_lons, src_grid.corner_lon[lids[0],:]]) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance issue: Using np.vstack in a loop repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lats and lat_gids are handled with np.append.
| isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]]) | ||
| if len(src_grid.corner_lat) > 0: | ||
| if len(isite.lon_corner_lats) == 0: | ||
| isite.lon_corner_lats = src_grid.corner_lat[lids[0],:] | ||
| isite.lon_corner_lons = src_grid.corner_lon[lids[0],:] | ||
| else: | ||
| isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]]) | ||
| isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]]) | ||
|
|
||
| lids = np.where( (src_grid.lat >= isite.lat_bnds[0]) & | ||
| (src_grid.lat <= isite.lat_bnds[1])) | ||
| isite.lats = np.append(isite.lats, src_grid.lat[lids[0]]) | ||
| isite.lat_gids = np.append(isite.lat_gids, lids[0]+chunk_idx) | ||
|
|
||
| # Store SCRIP data for lat matches | ||
| if load_scrip: | ||
| isite.lat_areas = np.append(isite.lat_areas, src_grid.area[lids[0]]) | ||
| isite.lat_imasks = np.append(isite.lat_imasks, src_grid.imask[lids[0]]) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type safety concern: lon_imasks and lat_imasks are accumulated using np.append without explicit dtype, but the final imasks is expected to be int32 (line 254). If the source grid_imask has a different integer type, this could lead to type inconsistencies. Consider explicitly casting when appending: isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]]).astype(np.int32) or initializing the arrays with the correct dtype in the Site class.
| for s in sites: | ||
| n = s.length | ||
| lats[offset:offset+n] = s.lats | ||
| lons[offset:offset+n] = s.lons | ||
| areas[offset:offset+n] = s.areas | ||
| imasks[offset:offset+n] = s.imasks |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing validation: The function doesn't validate that each site has the required SCRIP data (areas, imasks) before accessing them. If filter_scrip was False for a site (which shouldn't happen based on current code flow, but could if the API is used differently), accessing s.areas or s.imasks at lines 266-267 would use empty arrays, leading to silent data corruption. Consider adding a validation check or assertion at the start of the function to ensure all sites have populated SCRIP data.
| """Create a SCRIP format grid file containing only the regional subset.""" | ||
| # Combine all sites into a single regional grid | ||
| total_size = sum(s.length for s in sites) | ||
|
|
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing edge case handling: If total_size is 0 (no grid cells found in any site), the function will create an empty grid file. While this might be technically valid, it could indicate a problem with the site definitions or grid data. Consider adding a check and warning/error message if total_size == 0.
| # If no grid cells are found, warn and do not create an empty grid file | |
| if total_size == 0: | |
| print(f"No grid cells found for any site; regional grid file '{casename}_regional_grid.nc' will not be created.", file=sys.stderr) | |
| return | |
| p.add_argument('--output-regional-grid', action='store_true', | ||
| help='Output a SCRIP format grid file containing only the regional subset') |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation: The file header documentation (lines 32-49) describes the outputs of this script but doesn't mention the new optional regional grid output file that can be generated with the --output-regional-grid flag. Consider adding a third output to the documentation describing the regional_grid.nc file and its format.
|
|
||
| # Load additional SCRIP variables if requested | ||
| if load_full_scrip and self.grid_type == "scrip": | ||
| self.area = f["grid_area"][self.chunk_idx:chunk_end].data |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When load_scrip is True but grid_area variable doesn't exist in the netCDF file, this will raise a KeyError. While SCRIP format files should have this variable, consider adding a check similar to the optional variables (grid_corner_lat, grid_imask) to handle files that might be missing this field, or at minimum provide a more informative error message.
| self.area = f["grid_area"][self.chunk_idx:chunk_end].data | |
| if "grid_area" in f.variables: | |
| self.area = f["grid_area"][self.chunk_idx:chunk_end].data | |
| else: | |
| raise KeyError( | |
| "Required SCRIP variable 'grid_area' not found in file '{}'. " | |
| "Set load_full_scrip=False or provide a valid SCRIP grid file " | |
| "containing 'grid_area'.".format(self.filename) | |
| ) |
| if has_corners: | ||
| corner_lats[offset:offset+n,:] = s.corner_lats | ||
| corner_lons[offset:offset+n,:] = s.corner_lons |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent handling of empty corner data: At line 255, has_corners checks if any site has corner data. However, at lines 159-161 and 269-270, the code assumes that if one site has corners, the array will have a valid shape. If a site matches no grid cells (empty after filtering), s.corner_lats could be an empty array without a second dimension, causing an error when trying to access .shape[1] or when trying to assign to corner_lats[offset:offset+n,:]. Consider adding checks to handle sites with no matched cells gracefully.
| areas = np.zeros(total_size) | ||
| imasks = np.zeros(total_size, dtype=np.int32) | ||
| has_corners = any(len(s.corner_lats) > 0 for s in sites) | ||
| num_corners = sites[0].corner_lats.shape[1] if has_corners else 0 |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError if no sites are provided. The code accesses sites[0].corner_lats.shape[1] without first checking if the sites list is empty. Consider adding a check: if not sites: raise ValueError("No sites provided") at the beginning of the function, or add a guard before accessing sites[0].
| has_corners = any(len(s.corner_lats) > 0 for s in sites) | ||
| num_corners = sites[0].corner_lats.shape[1] if has_corners else 0 | ||
| corner_lats = np.zeros((total_size, num_corners)) if has_corners else None | ||
| corner_lons = np.zeros((total_size, num_corners)) if has_corners else None | ||
|
|
||
| # Fill arrays from sites | ||
| offset = 0 | ||
| for s in sites: | ||
| n = s.length | ||
| lats[offset:offset+n] = s.lats | ||
| lons[offset:offset+n] = s.lons | ||
| areas[offset:offset+n] = s.areas | ||
| imasks[offset:offset+n] = s.imasks | ||
| if has_corners: | ||
| corner_lats[offset:offset+n,:] = s.corner_lats | ||
| corner_lons[offset:offset+n,:] = s.corner_lons |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes all sites have the same number of corners when has_corners is True, but doesn't validate this assumption. If sites have different corner counts (e.g., one site has 4 corners and another has 3), this could lead to shape mismatch errors when filling the corner arrays at lines 269-270. Consider adding validation to ensure all sites with corners have the same shape, or handle sites with different corner counts appropriately.
| isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]]) | ||
| isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]]) |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance issue: Using np.vstack in a loop (lines 376, 377, 393, 394) repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lons and lon_gids are handled with np.append.
AaronDonahue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great new functionality! Thanks for putting it in the utility @mahf708 . Quick question, have you tested the new function? It occurs to me that if this utility will be used more robustly we may want to eventually set up pytest's for it.
|
Yeah, I have. We should set up CI here soon |
Implemented complete workflow to remap E3SM SCREAM regional outputs (~3.25 km)
to HRRR Lambert Conformal grid (~3 km) over CONUS using ESMF/NCO tools.
SCREAM Regional Domain (CONUS):
- Lat: 24.0°N - 50.0°N
- Lon: 235.0°E - 294.0°E (125°W to 66°W)
- Native grid: 786,454 nodes (ne1024 spectral element)
Target HRRR Grid:
- Lambert Conformal projection
- Output dimensions: 1059 × 1799 × 128 levels × 12 timesteps
Remapping Workflow:
1. Generate SCRIP files for ESMF:
- create_regional_remapper.py: SCREAM regional grid SCRIP (critical fix
from Naser for regional spectral element grids)
- make_HRRR_SCRIP.ncl: HRRR curvilinear grid SCRIP
2. Generate weight files with ESMF_RegridWeightGen:
- run_module_RegridWeightGen.sh: Creates neareststod and bilinear weights
- Note: Can run directly on login node with tgtconf="netcdf"
3. Remap and combine variables:
- remap_dbz_zmid.sh: Remaps reflectivity (neareststod) and geopotential
height/pressure (bilinear), combines reflectivity with z_mid
- Outputs: Two files per hourly input
* scream.z_mid_p_mid.*.nc (7.6 GB, z_mid + p_mid, bilinear)
* scream.diag_equiv_reflectivity.*.nc (6.1 GB, reflectivity + z_mid)
Parallel Processing with TaskFarmer:
- generate_taskfarmer_list.py: Creates task list for parallel remapping
- slurm_regrid_taskfarmer.sh: TaskFarmer batch script for NERSC Perlmutter
- Recommended: 12-14 tasks/node (THREADS=12-14) on CPU nodes (512 GB RAM)
- Processing time: ~26 minutes per hourly file (12×5-min timesteps)
Performance:
- Step 1 (remap reflectivity, neareststod): ~140s
- Step 2 (remap z_mid/p_mid, bilinear + compress): ~740s
- Step 3 (combine variables): ~200s
- Step 4 (rename dims, remove vars, compress): ~470s
- Total: ~1560s (26 min) per file
Compression: netCDF4 deflate_level=1
Chunking: time=1, lev=128 (full), y=256, x=256
- Optimized for column-wise operations (vertical max, echo-top height)
- Storage savings: ~80% (6-8 GB vs 31 GB uncompressed)
References:
- SCRIP fix: E3SM-Project/eamxx-scripts#193
- ESMF_RegridWeightGen: https://earthsystemmodeling.org/regrid/
|
I lack the knowledge about this stuff for giving useful comments on this. Sorry. |
This pull request adds support for generating a regional SCRIP-format grid file containing only the subset of grid cells relevant to the selected sites. It introduces new logic and data structures to track and filter SCRIP-specific variables, and provides a new command-line option to output the regional grid. The changes also improve the handling and filtering of grid data for regional remapping.
Regional SCRIP grid output:
--output-regional-gridcommand-line option to output a SCRIP-format grid file containing only the regional subset of grid cells.construct_regional_gridfunction to generate a new netCDF SCRIP grid file with filtered grid cells and associated variables (area, imask, corners, etc.) for the regional subset.GridandSiteclasses to track SCRIP-specific variables (area,corner_lat,corner_lon,imask,dims) and to collect and filter these variables for the regional subset. [1] [2] [3] [4]