-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/density tracks #1003
base: develop
Are you sure you want to change the base?
Feature/density tracks #1003
Conversation
climada/hazard/tc_tracks.py
Outdated
"""Compute absolute and normalized tropical cyclone track density as the number of points per | ||
grid cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting addition. I think though that even with the explanation in the pull request, I do not understand what this method does (and what is the use of it). Can you expand on the docstring to make it clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was exactly doing it in this moment ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the docstring which should be clearer now. The role of this function is to create the data for an other plotting function (which will follow soon) and should give something this (first approximation):

A plot like this allows you to compare TC activities between different climate scenarios, or different time periods, or models. It's a visualization for the hazard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the exact goal? Because the method now counts the number of track points per pixel. Thus, if a track is just staying a long time in the same place, it counts as high density. But, it is still just a single track in this pixel. Conversely, if a track is moving fast, it gets low density. I am not sure what this would indicate to be honest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, but, the goal is to display TC activity, and if a track stays longer in a grid cell, this rightfully represent higher TC activity in my opinion. The question then becomes, how do you define TC activity? Is it how many different tracks cross a grid cell or how many tracks cross a grid, and how long they stay in it (that is the current version)?
If you consider two different cyclones with two different translational speeds but with the same angular wind speed, the slower one will have more impact because it stays longer and has the same intensity as the other. So in an impact context, track density (a measure of TC activity) should reflect this.
But I see your point, and it's worth discussing it with Kerry and Simona next week, frankly both methods would be valid, they just represent something slightly different.
For clarification, the image I sent represent the track density (defined as in this message) for 300 tracks in 2025 with CMIP5 data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the analysis. But then I would ask: why not look at wind intensity density? Because if the argument is that a track staying longer in a position creates more impact, then you would also have to consider that higher winds create more impact. Hence, I would say:
1- if you are interested in impacts, you should consider the windfield distributions
2- if you are interested in track density, then you should consider different track densities.
But maybe I am missing the point and the approach for the track point density is the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I agree. The point approach was just the practical way I thought to program it. I need to rethink it a bit then 🤓. I'll change it so that a maximum of one point per track per grid cell will be allowed to be counted in that cell.
And at a later stage, we can think of how to compute wind field densities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it can be an argument in the method (if it is easy to implement)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's easy
climada/hazard/tc_tracks.py
Outdated
lat_bins = np.arange(-90, 91, res) # 91 and not 90 for the bins (90 included) | ||
lon_bins = np.arange(-180, 181, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would lead to values larger than 90 if say res=0.3. I think this is not what you want. Also, note that it probably is better to use np.linspace than np.arange for non-integer values (see https://numpy.org/doc/stable/reference/generated/numpy.arange.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks for catching that!
climada/hazard/tc_tracks.py
Outdated
hist = hist / hist.sum() if mode == "normalized" else hist | ||
|
||
return hist, lat_bins, lon_bins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to work with sparse matrices and rasters? Because in general, the output will be very very sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
@@ -2890,6 +2896,10 @@ def compute_track_density( | |||
density: bool (optional) default: False | |||
If False it returns the number of samples in each bin. If True, returns the | |||
probability density function at each bin computed as count_bin / tot_count. | |||
filter_tracks: bool (optional) default: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps calling this argument something like count_tracks
would be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, filter_track
it's a bit cryptic...
climada/hazard/tc_tracks.py
Outdated
""" | ||
|
||
# ensure equal time step | ||
# tc_track.equal_timestep(time_step_h=time_step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a reasonable check on the track's temporal resolution versus res
to make sure all tracks will be counted (see my comment in the other thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, there should be a limit ratio between temporal resolution and grid cell resolution. Considering that the fastest recorded translational speed of a TC is around 110 km/h, a temporal resolution of 1h and grid res of 1° (~100km) should be the limit. In this case, I would make time_step a function of res and not the opposite.
If this slows down the code too much (probably not), given the low average speed of ~20km/h we could adjust it.
climada/hazard/tc_tracks.py
Outdated
track.lat.values, track.lon.values, bins=[lat_bins, lon_bins], density=False | ||
) | ||
hist_new = csr_matrix(hist_new) | ||
hist_new[hist_new > 1] = 1 if filter_tracks else hist_new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not something like this (seems easier to read):
if filter_tracks:
np.clip(hist_new, a_max=1, out=hist_new)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to do this before converting to a csr_matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I like it, then I guess it doesn't really make sense to use sparse matrixes anymore. Since the core function np.histogram2d
doesn't exist with sparse, and the later operations are better off with numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh actually np.clip creates some issues later on with the testing, I'll keep it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then Maybe:
if filter_tracks:
hist_new = np.minimum(hist_new, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use np.where(x>1,1,x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep it dense, then you should just make it clear in the docstring that this code does not work well at high resolution. It would be good to do some simple testing (res=1, 0.1, 0.01, 0.001) and see how much memory / time it takes.
At the moment, using sparse (and numpy), with 300 tracks at global scale it takes:
1°: 141ms
0.1°: 9.1s
0.01°: 17 minutes
0.001°: ...
If, and only if, it makes sense to use this method at really high resolutions (e.g., > 0.1°) I will try to optimize it. But it might be worth it to have faster method, for larger track datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know whether higher resolution is needed, but it would be good to mention in the docstring what kind of resolution is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience anything below 0.1 degrees is far too noisy, typically one uses something like 1, 5 or even 10 degrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall idea a lot! We actually had implemented something quite similar to validate the new (still to be incorporated) TC stochastic tracks model. However we had done it a bit differently:
- First, there was a parameter for the "minimum intensity" threshold to consider (something like
min_ max_sustained_wind
), such that you can plot the density of tracks exceeding a certain wind speed (or Saffir-Simpson category). This enabled us to find out where there where too many or too few tracks in general (when doing for example the difference between probabilistic and historical tracks) versus how more intense events were over-/under-estimated. I'd suggest add this feature here as well. - Ideally there is a valid unit to the "track density". To do so, we counted not the number of points (which is kind of the total duration of tracks being there) but instead counted the number of tracks (after identifying points within each grid cell, we counted the unique cyc_id values of these points), which leads to units or "number of tracks per year" if you then scale by the number of years in your dataset. You might want to consider doing the same, or at least enabling the user to choose between both approaches.
- We had implemented that using lines, but it took ages to calculate (well, it was on a 10'000 years event set...), so using points might be good - and I am already relieved to see the validation of the new synthetic tracks model will be made easier thanks to this method 😄). In fact, there should then be a reasonable default for the temporal resolution of the tracks as a function of the raster cells size (i.e., if temporal resolution is too low there might be no point in a grid cell because it jumped from east to west of the cell but is not counted).
- Finally, just to chime on the "track versus windfield" discussion in earlier comments: I think both are very valuable and complementary. One probably want to be able to validate the tracks first, and validate wind fields in a second step. This is the approach we had taken back then and it enabled tuning tracks algorithm without doing the more expensive windfield computation for each test.
Good to hear! Is your version somewhere on a climada branch ? |
No, it was in our code base at CelsiusPro so I don't have it anymore... @ChrisFairless probably has access to it but I think your approach is easier so I'm not sure you'd gain much (?). There was a class called TcTracksValidation or so which one could use to also plot the fields and their differences if I recall well. The main main thing missing so far in your code is being able to filter points by minimum wind speed. |
I see, I have just added the feature 👍 now we can select above, below and in between wind speeds. |
Thanks Chris for the feedback! Please let me all know if code wise I shall change something, I made some changes from the first revision. Aside the weird pre-commit check failing here, the PR shall be basically ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additions, I think it is starting to looking quite good! It will definitely be useful, as stated in my previous review already 🙂
I have found a few inconsistencies in how norm
is handled, and I have a few suggestions too in my comments. I hope this helps to finalize the PR!
climada/hazard/tc_tracks.py
Outdated
# select according to wind speed | ||
wind_speed = track.max_sustained_wind.values | ||
if wind_min and wind_max: | ||
index = np.where((wind_speed >= wind_min) & (wind_speed <= wind_max))[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd use >= wind_min
but < wind_max
(here and below), such that if you call the function once with wind_min=10, wind_max=20
and then with wind_min=20
you don't get track points with a wind speed of exactly 20 counted in both. In other words I'd make wind_min
inclusive but wind_max
exclusive (i.e., [wind_min, wind_max)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
climada/hazard/tc_tracks.py
Outdated
|
||
# define grid resolution and bounds for density computation | ||
lat_bins: np.ndarray = np.linspace(-90, 90, int(180 / res)) | ||
lon_bins: np.ndarray = np.linspace(-180, 180, int(360 / res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I strongly suggest to enable the user to define a range of lon
and lat
(new input parameters lon_range
and lat_range
, default to None which leads to the current behavior of covering the Earth). If a user only has the North Atlantic covered by their track they will be happy to define a sub-domain. This also applies to the plotting function (plot_track_density
) of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Benoit, sorry for my late reply and thanks for the feedback! Good point, I just finished implementing that
return hist_count, lat_bins, lon_bins | ||
|
||
|
||
def compute_genesis_density( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition
climada/hazard/tc_tracks.py
Outdated
If true the function computes the track density of only the genesis location of tracks | ||
norm: bool (optional), default = False | ||
If False it returns the number of samples in each bin. If True, returns the | ||
specified normalization function at each bin computed as count_bin / grid_area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring should be updated, since the norm
parameter now is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you might want to specify the possible values that norm
can take somewhere in the code (e.g., as a constant) and to check in this function whether the value passed by the user is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks!
grid_area, _ = u_coord.compute_grid_cell_area(res=res) | ||
norm_hist: np.ndarray = hist_count / grid_area | ||
elif norm == "sum": | ||
norm_hist: np.ndarray = hist_count / hist_count.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add
else:
raise ValueError("Invalid value for input parameter 'norm', it should be one of 'area' or 'sum'")
or something like this? It would also remove the pylint warning below ("Possibly using variable 'norm_hist' before assignment").
hist_abs, *_ = tc.compute_track_density( | ||
tc_tracks, | ||
res=10, | ||
norm=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this fail because norm=None
is not handled?
@bguillod Can I merge ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work Nico! Fixed a couple of things in the docstrings. As discussed let's use point bound tuples consistently across functions. Otherwise ready to merge!
climada/hazard/tc_tracks.py
Outdated
res: int (optional), default: 5° | ||
resolution in degrees of the grid bins in which the density will be computed | ||
bounds: tuple, (optional) dafault: None | ||
(lat_min,lat_max,lon_min,lon_max) latitude and longitude bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elswhere we use a tuple with the order (lon_min, lat_min, lon_max, lat_max)
for bounds (e.g. in Centroids.from_pnt_bounds()
). For consistency's sake, could be worthwile to change this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check tests after making this change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! it is now adapted.
climada/util/coordinates.py
Outdated
res: float | ||
Grid resolution in degrees | ||
bounds: tuple, dafault: None | ||
(lat_min,lat_max,lon_min,lon_max) latitude and longitude bounds to compute grid cell area |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment re bounds ordering
return grid_area, lat_bins, lon_bins | ||
|
||
|
||
def compute_grid_cell_area( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use geopandas functionalities to project to equal-area crs and calculate areas?
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
Co-authored-by: Sarah Hülsen <49907095+sarah-hlsn@users.noreply.github.com>
…roject/climada_python into feature/density_tracks
I think this PR is ready to be merged, do I have the green light ? @chahank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition! I think this is getting closer to merging. There are still a few fundamental questions to solve before proceeding such as why the methods are where they are, and the quality of the tests.
Thanks for all the efforts. I know this is a longer process, but we really have to be careful to not fall in the feature bloating trap again as this has caused us a very large amount of refactoring work.
@@ -158,6 +161,136 @@ def plot_intensity( | |||
|
|||
raise ValueError("Provide one event id or one centroid id.") | |||
|
|||
@staticmethod | |||
def plot_track_density( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused now about this method. Why is it a Hazard method? This should be a TCTracks
method right? It has nothing to do with Hazards in general. At most maybe it could be a TropCyclone
method.
In general, we should not add methods that have only a very specific narrow application to a general basis class. This will lead back to the clutter of methods that we now spend years to reduce.
Where do you think this would fit best? TCTracks
or TropCyclones
? Since it is only about tracks and not about intensities, I would move this to TCTracks
.
-------- | ||
>>> tc_tracks = TCTrack.from_ibtracs_netcdf("path_to_file") | ||
>>> tc_tracks.equal_timestep(time_step_h = 1) | ||
>>> hist_count, *_ = compute_track_density(tc_track = tc_tracks, res = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is cleaner..
>>> hist_count, *_ = compute_track_density(tc_track = tc_tracks, res = 1) | |
>>> hist_count, _, _ = compute_track_density(tc_track = tc_tracks, res = 1) |
|
||
""" | ||
|
||
limit_ratio: float = 1.12 * 1.1 # record tc speed 112km/h -> 1.12°/h + 10% margin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that become a problem if there is a future TC set with tracks with windspeeds above 112?
"""Compute the density of track genesis locations. This function works under the hood | ||
of :py:meth:`compute_track_density`. If it is called with the parameter genesis = True, | ||
the function return the number of genesis points per grid cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is not intended to be accessible to the user, then please define as such _compute_genesis_density
.
tc_track: TCT track object | ||
TC track object containing a list of all tracks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a function instead of a class method since the first attribute seems to be self?
# Iterate over consecutive latitude and longitude edges | ||
for i, (lat1, lat2) in enumerate(zip(lat_edges[:-1], lat_edges[1:])): | ||
for j, (lon1, lon2) in enumerate(zip(lon_edges[:-1], lon_edges[1:])): | ||
|
||
# 5th point to close the loop | ||
poly_lons = [lon1, lon2, lon2, lon1, lon1] | ||
poly_lats = [lat1, lat1, lat2, lat2, lat1] | ||
|
||
# Compute the area of the grid cell | ||
poly_area, _ = geod.polygon_area_perimeter(poly_lons, poly_lats) | ||
area[i, j] = abs(poly_area) # Convert from m² to km² |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite inefficient. In general, nested loops in Python are not a good sign as these are often not vectorized.
area = u_coord.compute_grid_cell_area( | ||
res=res, projection="sphere", units="km^2" | ||
) | ||
area_test, *_ = u_coord.compute_grid_cell_area_validation(res=res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tests you cannot simply ignore the outputs. Please test the functions fully.
self.assertEqual(area.shape, (179, 359)) | ||
self.assertEqual(area_test.shape, (179, 359)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear why this tests the methods.
def compute_grid_cell_area( | ||
res: float = None, | ||
bounds: tuple = None, | ||
projection: str = "WGS84", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also accept projections for instance in meters? Please make clear if not, and catch the error accordingly.
return grid_area, lat_bins, lon_bins | ||
|
||
|
||
def compute_grid_cell_area( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please compare this method to the one implemented in the Centroids class Centroids.get_pixel_area
https://github.com/CLIMADA-project/climada_python/blob/main/climada/hazard/centroids/centr.py#L571. If they do the same, please test and keep the most efficient one and consolidate in the utils.
Changes proposed in this PR:
This PR adds a function to compute track density. Tracks are considered lines but are in reality a succession of points. The function ensure equal time step for all tracks by calling
equal_timestep()
, then proceed to compute the number of points per grid cell with the desired resolution.compute_track_density()
computes the data forplot_track_density()
.Functions added:
compute_track_density()
test_compute_track_density()
compute_grid_area()
plot_track_density()
This PR fixes #
PR Author Checklist
develop
)PR Reviewer Checklist