Skip to content

Conversation

@mahf708
Copy link
Contributor

@mahf708 mahf708 commented Jan 27, 2026

This pull request adds support for generating a regional SCRIP-format grid file containing only the subset of grid cells relevant to the selected sites. It introduces new logic and data structures to track and filter SCRIP-specific variables, and provides a new command-line option to output the regional grid. The changes also improve the handling and filtering of grid data for regional remapping.

Regional SCRIP grid output:

  • Added the --output-regional-grid command-line option to output a SCRIP-format grid file containing only the regional subset of grid cells.
  • Implemented the construct_regional_grid function to generate a new netCDF SCRIP grid file with filtered grid cells and associated variables (area, imask, corners, etc.) for the regional subset.
  • Enhanced the Grid and Site classes to track SCRIP-specific variables (area, corner_lat, corner_lon, imask, dims) and to collect and filter these variables for the regional subset. [1] [2] [3] [4]
  • Modified the main workflow to collect SCRIP variables during grid chunk loading and to filter them appropriately for each site, ensuring that the output regional grid preserves all necessary metadata. [1] [2]
  • Ensured that the new functionality is only applied when the input grid is in SCRIP format, with appropriate warnings otherwise. [1] [2]

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds functionality to output SCRIP format grid files containing only the regional subset when creating regional remapping files. This is useful for workflows that need the regional grid definition in addition to the remapping weights.

Changes:

  • Extended Grid and Site classes to load and store SCRIP-specific data (grid areas, corner coordinates, masks)
  • Added construct_regional_grid() function to generate SCRIP format grid files from filtered regional data
  • Added --output-regional-grid command-line flag to enable regional grid output
  • Replaced deprecated np.in1d with np.isin for better NumPy compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +393 to +394
isite.lat_corner_lats = np.vstack([isite.lat_corner_lats, src_grid.corner_lat[lids[0],:]])
isite.lat_corner_lons = np.vstack([isite.lat_corner_lons, src_grid.corner_lon[lids[0],:]])
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance issue: Using np.vstack in a loop repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lats and lat_gids are handled with np.append.

Copilot uses AI. Check for mistakes.
Comment on lines +370 to +387
isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]])
if len(src_grid.corner_lat) > 0:
if len(isite.lon_corner_lats) == 0:
isite.lon_corner_lats = src_grid.corner_lat[lids[0],:]
isite.lon_corner_lons = src_grid.corner_lon[lids[0],:]
else:
isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]])
isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]])

lids = np.where( (src_grid.lat >= isite.lat_bnds[0]) &
(src_grid.lat <= isite.lat_bnds[1]))
isite.lats = np.append(isite.lats, src_grid.lat[lids[0]])
isite.lat_gids = np.append(isite.lat_gids, lids[0]+chunk_idx)

# Store SCRIP data for lat matches
if load_scrip:
isite.lat_areas = np.append(isite.lat_areas, src_grid.area[lids[0]])
isite.lat_imasks = np.append(isite.lat_imasks, src_grid.imask[lids[0]])
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type safety concern: lon_imasks and lat_imasks are accumulated using np.append without explicit dtype, but the final imasks is expected to be int32 (line 254). If the source grid_imask has a different integer type, this could lead to type inconsistencies. Consider explicitly casting when appending: isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]]).astype(np.int32) or initializing the arrays with the correct dtype in the Site class.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +267
for s in sites:
n = s.length
lats[offset:offset+n] = s.lats
lons[offset:offset+n] = s.lons
areas[offset:offset+n] = s.areas
imasks[offset:offset+n] = s.imasks
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation: The function doesn't validate that each site has the required SCRIP data (areas, imasks) before accessing them. If filter_scrip was False for a site (which shouldn't happen based on current code flow, but could if the API is used differently), accessing s.areas or s.imasks at lines 266-267 would use empty arrays, leading to silent data corruption. Consider adding a validation check or assertion at the start of the function to ensure all sites have populated SCRIP data.

Copilot uses AI. Check for mistakes.
"""Create a SCRIP format grid file containing only the regional subset."""
# Combine all sites into a single regional grid
total_size = sum(s.length for s in sites)

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing edge case handling: If total_size is 0 (no grid cells found in any site), the function will create an empty grid file. While this might be technically valid, it could indicate a problem with the site definitions or grid data. Consider adding a check and warning/error message if total_size == 0.

Suggested change
# If no grid cells are found, warn and do not create an empty grid file
if total_size == 0:
print(f"No grid cells found for any site; regional grid file '{casename}_regional_grid.nc' will not be created.", file=sys.stderr)
return

Copilot uses AI. Check for mistakes.
Comment on lines +337 to +338
p.add_argument('--output-regional-grid', action='store_true',
help='Output a SCRIP format grid file containing only the regional subset')
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation: The file header documentation (lines 32-49) describes the outputs of this script but doesn't mention the new optional regional grid output file that can be generated with the --output-regional-grid flag. Consider adding a third output to the documentation describing the regional_grid.nc file and its format.

Copilot uses AI. Check for mistakes.

# Load additional SCRIP variables if requested
if load_full_scrip and self.grid_type == "scrip":
self.area = f["grid_area"][self.chunk_idx:chunk_end].data
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When load_scrip is True but grid_area variable doesn't exist in the netCDF file, this will raise a KeyError. While SCRIP format files should have this variable, consider adding a check similar to the optional variables (grid_corner_lat, grid_imask) to handle files that might be missing this field, or at minimum provide a more informative error message.

Suggested change
self.area = f["grid_area"][self.chunk_idx:chunk_end].data
if "grid_area" in f.variables:
self.area = f["grid_area"][self.chunk_idx:chunk_end].data
else:
raise KeyError(
"Required SCRIP variable 'grid_area' not found in file '{}'. "
"Set load_full_scrip=False or provide a valid SCRIP grid file "
"containing 'grid_area'.".format(self.filename)
)

Copilot uses AI. Check for mistakes.
Comment on lines +268 to +270
if has_corners:
corner_lats[offset:offset+n,:] = s.corner_lats
corner_lons[offset:offset+n,:] = s.corner_lons
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent handling of empty corner data: At line 255, has_corners checks if any site has corner data. However, at lines 159-161 and 269-270, the code assumes that if one site has corners, the array will have a valid shape. If a site matches no grid cells (empty after filtering), s.corner_lats could be an empty array without a second dimension, causing an error when trying to access .shape[1] or when trying to assign to corner_lats[offset:offset+n,:]. Consider adding checks to handle sites with no matched cells gracefully.

Copilot uses AI. Check for mistakes.
areas = np.zeros(total_size)
imasks = np.zeros(total_size, dtype=np.int32)
has_corners = any(len(s.corner_lats) > 0 for s in sites)
num_corners = sites[0].corner_lats.shape[1] if has_corners else 0
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential IndexError if no sites are provided. The code accesses sites[0].corner_lats.shape[1] without first checking if the sites list is empty. Consider adding a check: if not sites: raise ValueError("No sites provided") at the beginning of the function, or add a guard before accessing sites[0].

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +270
has_corners = any(len(s.corner_lats) > 0 for s in sites)
num_corners = sites[0].corner_lats.shape[1] if has_corners else 0
corner_lats = np.zeros((total_size, num_corners)) if has_corners else None
corner_lons = np.zeros((total_size, num_corners)) if has_corners else None

# Fill arrays from sites
offset = 0
for s in sites:
n = s.length
lats[offset:offset+n] = s.lats
lons[offset:offset+n] = s.lons
areas[offset:offset+n] = s.areas
imasks[offset:offset+n] = s.imasks
if has_corners:
corner_lats[offset:offset+n,:] = s.corner_lats
corner_lons[offset:offset+n,:] = s.corner_lons
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes all sites have the same number of corners when has_corners is True, but doesn't validate this assumption. If sites have different corner counts (e.g., one site has 4 corners and another has 3), this could lead to shape mismatch errors when filling the corner arrays at lines 269-270. Consider adding validation to ensure all sites with corners have the same shape, or handle sites with different corner counts appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +377
isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]])
isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]])
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance issue: Using np.vstack in a loop (lines 376, 377, 393, 394) repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lons and lon_gids are handled with np.append.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@AaronDonahue AaronDonahue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great new functionality! Thanks for putting it in the utility @mahf708 . Quick question, have you tested the new function? It occurs to me that if this utility will be used more robustly we may want to eventually set up pytest's for it.

@mahf708
Copy link
Contributor Author

mahf708 commented Jan 28, 2026

Yeah, I have. We should set up CI here soon

feng045 added a commit to WACCEM/SCREAM_MCS that referenced this pull request Jan 28, 2026
Implemented complete workflow to remap E3SM SCREAM regional outputs (~3.25 km)
to HRRR Lambert Conformal grid (~3 km) over CONUS using ESMF/NCO tools.

SCREAM Regional Domain (CONUS):
- Lat: 24.0°N - 50.0°N
- Lon: 235.0°E - 294.0°E (125°W to 66°W)
- Native grid: 786,454 nodes (ne1024 spectral element)

Target HRRR Grid:
- Lambert Conformal projection
- Output dimensions: 1059 × 1799 × 128 levels × 12 timesteps

Remapping Workflow:
1. Generate SCRIP files for ESMF:
   - create_regional_remapper.py: SCREAM regional grid SCRIP (critical fix
     from Naser for regional spectral element grids)
   - make_HRRR_SCRIP.ncl: HRRR curvilinear grid SCRIP

2. Generate weight files with ESMF_RegridWeightGen:
   - run_module_RegridWeightGen.sh: Creates neareststod and bilinear weights
   - Note: Can run directly on login node with tgtconf="netcdf"

3. Remap and combine variables:
   - remap_dbz_zmid.sh: Remaps reflectivity (neareststod) and geopotential
     height/pressure (bilinear), combines reflectivity with z_mid
   - Outputs: Two files per hourly input
     * scream.z_mid_p_mid.*.nc (7.6 GB, z_mid + p_mid, bilinear)
     * scream.diag_equiv_reflectivity.*.nc (6.1 GB, reflectivity + z_mid)

Parallel Processing with TaskFarmer:
- generate_taskfarmer_list.py: Creates task list for parallel remapping
- slurm_regrid_taskfarmer.sh: TaskFarmer batch script for NERSC Perlmutter
- Recommended: 12-14 tasks/node (THREADS=12-14) on CPU nodes (512 GB RAM)
- Processing time: ~26 minutes per hourly file (12×5-min timesteps)

Performance:
- Step 1 (remap reflectivity, neareststod): ~140s
- Step 2 (remap z_mid/p_mid, bilinear + compress): ~740s
- Step 3 (combine variables): ~200s
- Step 4 (rename dims, remove vars, compress): ~470s
- Total: ~1560s (26 min) per file

Compression: netCDF4 deflate_level=1
Chunking: time=1, lev=128 (full), y=256, x=256
- Optimized for column-wise operations (vertical max, echo-top height)
- Storage savings: ~80% (6-8 GB vs 31 GB uncompressed)

References:
- SCRIP fix: E3SM-Project/eamxx-scripts#193
- ESMF_RegridWeightGen: https://earthsystemmodeling.org/regrid/
@bartgol
Copy link
Contributor

bartgol commented Jan 28, 2026

I lack the knowledge about this stuff for giving useful comments on this. Sorry.

@bartgol bartgol removed their request for review January 28, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants