Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query about interpolation in gridded_timeseries.py #140

Open
sspagnol opened this issue Apr 20, 2021 · 2 comments
Open

Query about interpolation in gridded_timeseries.py #140

sspagnol opened this issue Apr 20, 2021 · 2 comments
Labels
gridded_timeseries Issues relating to the gridded time series product question Further information is requested

Comments

@sspagnol
Copy link

Have a query about the interpolation done in grid_variable function in gridded_timeseries.py

Had a situation where in some timesteps the profile had nan data, which when interpolated seems to produce more nans that I expected. Consider the toy problem

print("np.interp test")
depth_bins = np.array([10, 20, 30, 40, 50])

depths = np.array([12, 27, 35, 46])
values = np.array([1, 2, np.nan, 5])

print(depth_bins)
interp_values = np.interp(depth_bins, depths, values, left=np.nan, right=np.nan)
print(interp_values)

# interpolate non-nan
nmask = np.logical_or(np.isnan(depths), np.isnan(values))
interp_values = np.interp(depth_bins, depths[~nmask], values[~nmask], left=np.nan, right=np.nan)
print(interp_values)

which produces

np.interp test
[10 20 30 40 50]
[       nan 1.53333333        nan        nan        nan]
[       nan 1.53333333 2.47368421 4.05263158        nan]

What is the reasoning to want depth_bins 30 and 40 to be set to nan?

@mhidas mhidas added question Further information is requested gridded_timeseries Issues relating to the gridded time series product labels Jun 7, 2022
@mphemming
Copy link

We are also having issues with these stray nans. See attached plot of missing gridded data in late 2021 between 16-24 m depth. I have detailed my debugging steps below and added a quick fix which seems to solve the problem.

figure

When running the for loop on line 190 in 'grid_variable' function in 'gridded_timeseries.py' for a date when there is missing gridded data, I get the following:

VoI_values = [16.7461, 18.769634, 17.33436, nan, 15.591092, 14.718737, 17.733799, 19.18304, 16.496675, 15.4011135, 15.960257, 13.950939]
depth = [48.674, 24.185463, 40.511154, nan, 72.88726, 96.549965, 32.34831, 16.022615, 56.836845, 80.774826, 64.999695, 109.65999]

then, after sorting on line 198 using function 'sort_depths' I get the following:

VoI_values = [18.769634, 17.33436, 16.7461, nan, 19.18304, 17.733799, 16.496675, 15.960257, 15.591092, 15.4011135, 14.718737, 13.950939]
depth = [24.185463, 40.511154, 48.674, nan, 16.022615, 32.34831, 56.836845, 64.999695, 72.88726, 80.774826, 96.549965, 109.65999]

The variable 'depth_mask' assigns the near surface data as False and the interpolated data is removed. Although I don't fully understand why this happens yet using the above VoI_values and depth as input. My guess is because of the nan and the incorrect sorting.

As a quick fix I added the following on lines 197-201 before the depth and VoI are sorted using function 'sort_depths':

# remove nans if present
if np.sum(np.logical_and(np.isnan(depth), np.isnan(VoI_values))) > 0:
    c = np.logical_and(np.isfinite(depth), np.isfinite(VoI_values)).transpose()
    VoI_values = list(np.array(VoI_values)[c])
    depth = list(np.array(depth)[c])

The hourly data is now interpolated throughout the water column during this time, with the 16-24 m white space in the plot attached now interpolated over.

@mhidas mhidas added this to the v1.7 Timeseries products milestone Jan 5, 2023
@mhidas
Copy link
Contributor

mhidas commented Jan 11, 2023

@sspagnol @mphemming Apologies for the long delay in responding to this query. I have finally had a brief look, and I agree that we could do better in the case illustrated by Simon's toy example. I'll try to spend some more time on this in the next month or two!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gridded_timeseries Issues relating to the gridded time series product question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants