allow outputing the grid when creating regional maps #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mahf708 wants to merge 1 commit into master from mahf708/util/regional-grid

+172 −6

Contributor

mahf708 commented Jan 27, 2026 •

edited

Loading

This pull request adds support for generating a regional SCRIP-format grid file containing only the subset of grid cells relevant to the selected sites. It introduces new logic and data structures to track and filter SCRIP-specific variables, and provides a new command-line option to output the regional grid. The changes also improve the handling and filtering of grid data for regional remapping.

Regional SCRIP grid output:

Added the --output-regional-grid command-line option to output a SCRIP-format grid file containing only the regional subset of grid cells.
Implemented the construct_regional_grid function to generate a new netCDF SCRIP grid file with filtered grid cells and associated variables (area, imask, corners, etc.) for the regional subset.
Enhanced the Grid and Site classes to track SCRIP-specific variables (area, corner_lat, corner_lon, imask, dims) and to collect and filter these variables for the regional subset. [1] [2] [3] [4]
Modified the main workflow to collect SCRIP variables during grid chunk loading and to filter them appropriately for each site, ensuring that the output regional grid preserves all necessary metadata. [1] [2]
Ensured that the new functionality is only applied when the input grid is in SCRIP format, with appropriate warnings otherwise. [1] [2]


          allow outputing the grid when creating regional maps

2b0ffe0

mahf708 requested review from AaronDonahue, Copilot and whannah1

January 27, 2026 23:28

Copilot started reviewing on behalf of mahf708

January 27, 2026 23:28

mahf708 requested a review from bartgol

January 27, 2026 23:29

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

This PR adds functionality to output SCRIP format grid files containing only the regional subset when creating regional remapping files. This is useful for workflows that need the regional grid definition in addition to the remapping weights.

Changes:

Extended Grid and Site classes to load and store SCRIP-specific data (grid areas, corner coordinates, masks)
Added construct_regional_grid() function to generate SCRIP format grid files from filtered regional data
Added --output-regional-grid command-line flag to enable regional grid output
Replaced deprecated np.in1d with np.isin for better NumPy compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

util/create_regional_remapper.py

Comment on lines +393 to +394

		isite.lat_corner_lats = np.vstack([isite.lat_corner_lats, src_grid.corner_lat[lids[0],:]])
		isite.lat_corner_lons = np.vstack([isite.lat_corner_lons, src_grid.corner_lon[lids[0],:]])

Copilot AI Jan 27, 2026

Performance issue: Using np.vstack in a loop repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lats and lat_gids are handled with np.append.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +370 to +387

+                      isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]])
+                      if len(src_grid.corner_lat) > 0:
+                        if len(isite.lon_corner_lats) == 0:
+                          isite.lon_corner_lats = src_grid.corner_lat[lids[0],:]
+                          isite.lon_corner_lons = src_grid.corner_lon[lids[0],:]
+                        else:
+                          isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]])
+                          isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]])
                     lids = np.where( (src_grid.lat >= isite.lat_bnds[0]) &
                                      (src_grid.lat <= isite.lat_bnds[1]))
                     isite.lats = np.append(isite.lats, src_grid.lat[lids[0]])
                     isite.lat_gids = np.append(isite.lat_gids, lids[0]+chunk_idx)
+                    # Store SCRIP data for lat matches
+                    if load_scrip:
+                      isite.lat_areas = np.append(isite.lat_areas, src_grid.area[lids[0]])
+                      isite.lat_imasks = np.append(isite.lat_imasks, src_grid.imask[lids[0]])

Copilot AI Jan 27, 2026

Type safety concern: lon_imasks and lat_imasks are accumulated using np.append without explicit dtype, but the final imasks is expected to be int32 (line 254). If the source grid_imask has a different integer type, this could lead to type inconsistencies. Consider explicitly casting when appending: isite.lon_imasks = np.append(isite.lon_imasks, src_grid.imask[lids[0]]).astype(np.int32) or initializing the arrays with the correct dtype in the Site class.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +262 to +267

+                for s in sites:
+                  n = s.length
+                  lats[offset:offset+n] = s.lats
+                  lons[offset:offset+n] = s.lons
+                  areas[offset:offset+n] = s.areas
+                  imasks[offset:offset+n] = s.imasks

Copilot AI Jan 27, 2026

Missing validation: The function doesn't validate that each site has the required SCRIP data (areas, imasks) before accessing them. If filter_scrip was False for a site (which shouldn't happen based on current code flow, but could if the API is used differently), accessing s.areas or s.imasks at lines 266-267 would use empty arrays, leading to silent data corruption. Consider adding a validation check or assertion at the start of the function to ensure all sites have populated SCRIP data.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

+                """Create a SCRIP format grid file containing only the regional subset."""
+                # Combine all sites into a single regional grid
+                total_size = sum(s.length for s in sites)

Copilot AI Jan 27, 2026

Missing edge case handling: If total_size is 0 (no grid cells found in any site), the function will create an empty grid file. While this might be technically valid, it could indicate a problem with the site definitions or grid data. Consider adding a check and warning/error message if total_size == 0.

Suggested change

      
              # If no grid cells are found, warn and do not create an empty grid file
          
              if total_size == 0:
          
                print(f"No grid cells found for any site; regional grid file '{casename}_regional_grid.nc' will not be created.", file=sys.stderr)
          
                return

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +337 to +338

		p.add_argument('--output-regional-grid', action='store_true',
		help='Output a SCRIP format grid file containing only the regional subset')

Copilot AI Jan 27, 2026

Missing documentation: The file header documentation (lines 32-49) describes the outputs of this script but doesn't mention the new optional regional grid output file that can be generated with the --output-regional-grid flag. Consider adding a third output to the documentation describing the regional_grid.nc file and its format.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

+                  # Load additional SCRIP variables if requested
+                  if load_full_scrip and self.grid_type == "scrip":
+                    self.area = f["grid_area"][self.chunk_idx:chunk_end].data

Copilot AI Jan 27, 2026

When load_scrip is True but grid_area variable doesn't exist in the netCDF file, this will raise a KeyError. While SCRIP format files should have this variable, consider adding a check similar to the optional variables (grid_corner_lat, grid_imask) to handle files that might be missing this field, or at minimum provide a more informative error message.

Suggested change

      
                  self.area = f["grid_area"][self.chunk_idx:chunk_end].data
          
                  if "grid_area" in f.variables:
          
                    self.area = f["grid_area"][self.chunk_idx:chunk_end].data
          
                  else:
          
                    raise KeyError(
          
                      "Required SCRIP variable 'grid_area' not found in file '{}'. "
          
                      "Set load_full_scrip=False or provide a valid SCRIP grid file "
          
                      "containing 'grid_area'.".format(self.filename)
          
                    )

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +268 to +270

+                  if has_corners:
+                    corner_lats[offset:offset+n,:] = s.corner_lats
+                    corner_lons[offset:offset+n,:] = s.corner_lons

Copilot AI Jan 27, 2026

Inconsistent handling of empty corner data: At line 255, has_corners checks if any site has corner data. However, at lines 159-161 and 269-270, the code assumes that if one site has corners, the array will have a valid shape. If a site matches no grid cells (empty after filtering), s.corner_lats could be an empty array without a second dimension, causing an error when trying to access .shape[1] or when trying to assign to corner_lats[offset:offset+n,:]. Consider adding checks to handle sites with no matched cells gracefully.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

+                areas = np.zeros(total_size)
+                imasks = np.zeros(total_size, dtype=np.int32)
+                has_corners = any(len(s.corner_lats) > 0 for s in sites)
+                num_corners = sites[0].corner_lats.shape[1] if has_corners else 0

Copilot AI Jan 27, 2026

Potential IndexError if no sites are provided. The code accesses sites[0].corner_lats.shape[1] without first checking if the sites list is empty. Consider adding a check: if not sites: raise ValueError("No sites provided") at the beginning of the function, or add a guard before accessing sites[0].

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +255 to +270

+                has_corners = any(len(s.corner_lats) > 0 for s in sites)
+                num_corners = sites[0].corner_lats.shape[1] if has_corners else 0
+                corner_lats = np.zeros((total_size, num_corners)) if has_corners else None
+                corner_lons = np.zeros((total_size, num_corners)) if has_corners else None
+                # Fill arrays from sites
+                offset = 0
+                for s in sites:
+                  n = s.length
+                  lats[offset:offset+n] = s.lats
+                  lons[offset:offset+n] = s.lons
+                  areas[offset:offset+n] = s.areas
+                  imasks[offset:offset+n] = s.imasks
+                  if has_corners:
+                    corner_lats[offset:offset+n,:] = s.corner_lats
+                    corner_lons[offset:offset+n,:] = s.corner_lons

Copilot AI Jan 27, 2026

The code assumes all sites have the same number of corners when has_corners is True, but doesn't validate this assumption. If sites have different corner counts (e.g., one site has 4 corners and another has 3), this could lead to shape mismatch errors when filling the corner arrays at lines 269-270. Consider adding validation to ensure all sites with corners have the same shape, or handle sites with different corner counts appropriately.

Copilot uses AI. Check for mistakes.

util/create_regional_remapper.py

Comment on lines +376 to +377

		isite.lon_corner_lats = np.vstack([isite.lon_corner_lats, src_grid.corner_lat[lids[0],:]])
		isite.lon_corner_lons = np.vstack([isite.lon_corner_lons, src_grid.corner_lon[lids[0],:]])

Copilot AI Jan 27, 2026

Performance issue: Using np.vstack in a loop (lines 376, 377, 393, 394) repeatedly reallocates and copies arrays, which is inefficient for large datasets. Since you're already using chunked reading for the main grid, consider either: (1) using a list and converting to array at the end with np.vstack(list) once, or (2) pre-allocating arrays and using index-based assignment, similar to how lons and lon_gids are handled with np.append.

Copilot uses AI. Check for mistakes.

AaronDonahue reviewed

View reviewed changes

Contributor

AaronDonahue left a comment

This is a great new functionality! Thanks for putting it in the utility @mahf708 . Quick question, have you tested the new function? It occurs to me that if this utility will be used more robustly we may want to eventually set up pytest's for it.

Contributor Author

mahf708 commented Jan 28, 2026

Yeah, I have. We should set up CI here soon

feng045 added a commit to WACCEM/SCREAM_MCS that referenced this pull request


          Add SCREAM to HRRR grid remapping workflow for regional CONUS outputs

73d19d4

Implemented complete workflow to remap E3SM SCREAM regional outputs (~3.25 km)
to HRRR Lambert Conformal grid (~3 km) over CONUS using ESMF/NCO tools.

SCREAM Regional Domain (CONUS):
- Lat: 24.0°N - 50.0°N
- Lon: 235.0°E - 294.0°E (125°W to 66°W)
- Native grid: 786,454 nodes (ne1024 spectral element)

Target HRRR Grid:
- Lambert Conformal projection
- Output dimensions: 1059 × 1799 × 128 levels × 12 timesteps

Remapping Workflow:
1. Generate SCRIP files for ESMF:
   - create_regional_remapper.py: SCREAM regional grid SCRIP (critical fix
     from Naser for regional spectral element grids)
   - make_HRRR_SCRIP.ncl: HRRR curvilinear grid SCRIP

2. Generate weight files with ESMF_RegridWeightGen:
   - run_module_RegridWeightGen.sh: Creates neareststod and bilinear weights
   - Note: Can run directly on login node with tgtconf="netcdf"

3. Remap and combine variables:
   - remap_dbz_zmid.sh: Remaps reflectivity (neareststod) and geopotential
     height/pressure (bilinear), combines reflectivity with z_mid
   - Outputs: Two files per hourly input
     * scream.z_mid_p_mid.*.nc (7.6 GB, z_mid + p_mid, bilinear)
     * scream.diag_equiv_reflectivity.*.nc (6.1 GB, reflectivity + z_mid)

Parallel Processing with TaskFarmer:
- generate_taskfarmer_list.py: Creates task list for parallel remapping
- slurm_regrid_taskfarmer.sh: TaskFarmer batch script for NERSC Perlmutter
- Recommended: 12-14 tasks/node (THREADS=12-14) on CPU nodes (512 GB RAM)
- Processing time: ~26 minutes per hourly file (12×5-min timesteps)

Performance:
- Step 1 (remap reflectivity, neareststod): ~140s
- Step 2 (remap z_mid/p_mid, bilinear + compress): ~740s
- Step 3 (combine variables): ~200s
- Step 4 (rename dims, remove vars, compress): ~470s
- Total: ~1560s (26 min) per file

Compression: netCDF4 deflate_level=1
Chunking: time=1, lev=128 (full), y=256, x=256
- Optimized for column-wise operations (vertical max, echo-top height)
- Storage savings: ~80% (6-8 GB vs 31 GB uncompressed)

References:
- SCRIP fix: E3SM-Project/eamxx-scripts#193
- ESMF_RegridWeightGen: https://earthsystemmodeling.org/regrid/

Contributor

bartgol commented Jan 28, 2026

I lack the knowledge about this stuff for giving useful comments on this. Sorry.

bartgol removed their request for review

January 28, 2026 23:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet