Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regridding section #9

Merged
merged 4 commits into from
Jan 25, 2023

Conversation

stephenworsley
Copy link
Contributor

No description provided.

Copy link
Contributor

@pp-mo pp-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thought.
I may have more to add later, but I want to get these suggestions out
(and I want a link to reference in another issue!)

Comment on lines 106 to 109
```python
# Consider a bonus task to measure the relative time taken.
# import time... etc
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sensible way is probably to use the notebook "magic" for this %timeit

Comment on lines 143 to 144
**Step 4:** Plot the results and the difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth making clear here that we can (+expect to) use standard Iris/matplotlib tools, not VTK


**Step 4:** Create a regridder from `mesh_cube` to the single celled cube you created.

*Note:* ESMF represents all lines as sections of great circles rather than lines of constant latitude. It also cannot represent sections of great circles with angle equal or greater than 180 degrees. `MeshToGridESMFRegridder` would fail to properly handle the single celled cube without the `resolution` keyword.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is effectively addressing two different points at once.

At least on first reading, the "cannot represent..." part seems irrelevant to the main point (though now I've read more, I see it is really not !).

I think, effectively, the latitude approximation and unrepresentable-cells problems are separate, even though the resolution key addresses both.
So perhaps the first example could have a smaller longitude range, to avoid the "unrepresentable" problem + only address the latitude-bounds question ?

Personally I also don't much like "would fail to properly handle .. without ..." bit.

  • what would happen without the resolution keyword
  • what difference does it make.

So, I think we should maybe state this in a more positive way, describing what we do, how + why.
E.G. something like

We provide a resolution keyword to divide each target cell into so many multiple sub-cells : this approximates a single cell whose upper+lower boundaries follow lines of latitude.

I guess you could then also say (something like)..

With no resolution key, the upper+lower boundaries of the (single) target cell are great circle lines, which gives a substantially different result (and not what was intended).

And also something like ..

In addition, ESMF cannot handle any cells covering >=180 degrees, e.g. longitudes -180..+180 : such cases will give errors : These cases can be fixed by specifying resolution=2.

It would also be great to link to a built-docs description here,
which would be here, but which is not currently up-to-date.
The source docstring tells it here.

Comment on lines +217 to +218
Note the difference in value. Also note that it takes more time to initialise a regridder with higher resolution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be too complicated, but it occurs to me that you might suggest the user tries several values for 'resolution' : A graph of time-required vs. residual error should demonstrate the issues quite nicely.

It definitely is worth simply stating that "the time/accuracy is a tradeoff, which the user needs to decide."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like it to clearly explain the existence of a time/accuracy tradeoff somewhere.

Comment on lines 225 to 229
**Step 7:** Repeat steps 1 - 6 for latitude bounds `[[-90, 90]]`, longitude bounds `[[-40, 40]]` and resolutions 2 and 10.

*Note:* Unlike lines of constant latitude, lines of constant longitude are already great circle arcs. Since these arcs are 180 degrees it is necessary to have a resolution argument. However, an increase in resolution will not affect the accuracy since a resolution of 2 will already have maximum accuracy. Note how the results are the equal.

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this.
It says "Since these arcs are 180 degrees it is necessary to have a resolution argument." But this now applys only to the longitudinal edges, so I guess it is saying we require a resolution>1 to resolve full-range cells in either lat or lon, which are unrepresentable in ESMF ??
But, that seems also to imply that resolution is dividing all (4) edges of target cells.
This is not clear, since the docstring suggests it is specific to latitude, as in

"represents the amount of latitude slices".

I guess this is not the place to explain it all, but we clearly need to be very careful not to say anything misleading or not-quite-true. Which may well apply to my suggested texts above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So internally, I believe the logic is that vertical resolution is only used in the specific case where the bounds are [-90, 90] in which case it is effectively set to 2 and in all other cases horizontal resolution is the only one used. I suppose the exercise here kind of suggests otherwise, but I think this might be a bit too deep into the internals to get into here.

plt.show()
```

We can then plot the difference between the UM data and the data regridded from LFRic. Since our data is now on a latlon grid we can do this with matplotlib as normal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be more important to say that "Since our data is now on a latlon grid" we can now subtract to get the difference between this and the UM equivalent data.

Copy link
Contributor

@pp-mo pp-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made further some comments + suggestions, but I think we could get on and merge this, and address those issues later.
Quick list -

Comment on lines +232 to +238
```python
lat_band_mean_10 = lat_band_mean_calculator_10(mesh_cube)
print(lat_band_mean_10.data)
iqplt.pcolormesh(lat_band_mean_10)
plt.gca().coastlines()
plt.show()
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't doing what the prompt asks for (plotting instead of printing).

Comment on lines +217 to +218
Note the difference in value. Also note that it takes more time to initialise a regridder with higher resolution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like it to clearly explain the existence of a time/accuracy tradeoff somewhere.

**Step 1:** Load a temperature cube using the `testdata_fetching` function `lfric_temp`. Extract a single pressure slice using the `cube.extract` method with a constraint `iris.Constraint(pressure=850)` as the argument (we choose this level because it has noticable details).


from testdata_fetching import fric_temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code needs to go in a code cell.
In the current version, it is rendered in markdown, but we need the actual results below.

**Step 1:** Load a temperature cube using the `testdata_fetching` function `lfric_temp`. Extract a single pressure slice using the `cube.extract` method with a constraint `iris.Constraint(pressure=850)` as the argument (we choose this level because it has noticable details).


from testdata_fetching import fric_temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misprint here too
from testdata_fetching import (missing - l)fric_temp

**Step 3:** Create a `MeshToGridESMFRegridder` regridder from the slice of the temperature cube onto the target cube. Set the resolution keyword to 2 (this should be sufficient since these are bands of constant longitude). Use this regridder to create a resulting cube.

```python
um_lon_band_mean_calculator = MeshToGridESMFRegridder(temp_slice, target_cube_lons, resolution=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this step is currently taking an extremely long time ~1m40, , and about half the VDI memory (~2Gb).
So I think that needs reviewing for practicality.
NB with latest code we can "switch" to C48-sized data, which is over 10x smaller (221k -> 14k points).
This may well be worth doing (we should probably make it standard).
But it will affect the current results + so may require some adjustment to be made (e.g. might then be too fast to demonstrate efficiency changes !)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like this problem is specific to longitude bands and performance is a lot better with latitude bands. I think this might have something to do with ESMF not liking long, thin cells. I've raised an issue here SciTools/iris-esmf-regrid#234 . Unfortunately, when taking the latitude band means there is a lot less visible detail in the resulting plot. I might try experimenting with other variables, but I think this is a reasonable trade off compared to minute long run times or C48 data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since merging this, I've also tested it with the newer C48 data.
Unfortunately, that is possibly too small, since the operations get so quick as to make the measurements less certain.
This op took about 1.6 seconds, instead of ~100.

@pp-mo pp-mo merged commit 795c4af into scitools-classroom:main Jan 25, 2023
@pp-mo pp-mo mentioned this pull request Jan 25, 2023
8 tasks
@pp-mo
Copy link
Contributor

pp-mo commented Jan 25, 2023

Outstanding suggestions here : #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants