-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs when indexing a partly in memory FieldTimeSeries
#4077
Comments
Bug 1: Suggested fix Nm = ifelse(start + length(fts.backend) - 1 > Nt , Nt-start+1, length(fts.backend))
fts.backend = new_backend(fts.backend, start, Nm) rather than Nm = length(fts.backend)
fts.backend = new_backend(fts.backend, start, Nm) (although this does permanently change the backend length, not sure if this is an issue though as we're at the end of the data) |
Bug 2: Suggested fix There's also the problem that the backend changes mid computation in # Otherwise, make a Field representing a linear interpolation in time
ψ₁ = fts[n₁]
ψ₂ = fts[n₂]
ψ̃ = Field(ψ₂ * ñ + ψ₁ * (1 - ñ)) to # Otherwise, make a Field representing a linear interpolation in time
ψ₂ = fts[n₂]
ψ₁ = fts[n₁]
ψ̃ = Field(ψ₂ * ñ + ψ₁ * (1 - ñ)) then (together with the previous change) the backend would update to include both |
Bug 3: Suggested fix On line 10 of findfirst(t -> t ≈ time, file_times) with findfirst(t -> (isapprox(t,time) || isapprox(t,time,atol=1e-14)), file_times) (though that's a bit ugly and arbitrary) and on line 251 of `field_time_series.jl' if all(time_range .≈ times) # good enough for most could be replaced by if isapprox(time_range,times) |
Thank you this is very helpful! PRs are very welcome. If possible, 1 PR at a time is best so we can hash out the solutions independently. I want to discuss just one bug at a time for clarity so let's start with bug 1. I was able to reproduce the error by writing: julia> f_fts = FieldTimeSeries(filename, "f"; backend = InMemory(5))
2×2×2×31 FieldTimeSeries{InMemory} located at (Center, Center, Center) of f at MWE_data_file.jld2
├── grid: 2×2×2 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 2×2×2 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory(1, 5)
├── path: MWE_data_file.jld2
├── name: f
└── data: 6×6×6×5 OffsetArray(::Array{Float64, 4}, -1:4, -1:4, -1:4, 1:5) with eltype Float64 with indices -1:4×-1:4×-1:4×1:5
└── max=0.743145, min=0.0, mean=0.0144117
julia> f_fts[28]
ERROR: BoundsError: attempt to access 31-element StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64} at index [32]
Stacktrace:
[1] throw_boundserror(A::StepRangeLen{Float64, Base.TwicePrecision{…}, Base.TwicePrecision{…}, Int64}, I::Tuple{Int64})
@ Base ./abstractarray.jl:737
[2] checkbounds
@ ./abstractarray.jl:702 [inlined]
[3] getindex
@ ./range.jl:955 [inlined]
[4] set!(fts::FieldTimeSeries{…}, path::String, name::String; warn_missing_data::Bool)
@ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/set_field_time_series.jl:25
[5] set!(fts::FieldTimeSeries{…}, path::String, name::String)
@ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/set_field_time_series.jl:13
[6] update_field_time_series!(fts::FieldTimeSeries{…}, n₁::Int64, n₂::Int64)
@ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:269
[7] update_field_time_series!(fts::FieldTimeSeries{…}, n₁::Int64)
@ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:261
[8] getindex(fts::FieldTimeSeries{…}, n::Int64)
@ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:278
[9] top-level scope
@ REPL[44]:1
Some type information was truncated. Use `show(err)` to see complete types. I think I see the issue that you're pointing out. Basically there is an implicit assumption that the elements are loaded in order. In that case, no error would occur because after calling julia> f_fts.backend
InMemory{Int64}(27, 5) so My opinion is that it might be more logical if the backend "length" always corresponds to the number of times in memory. To implement this I think we can change this line:
to start = min(n₁, backend.length - Nm + 1) Does this solution work @loisbaker ? Also ccing @simone-silvestri here |
Thanks! Good point, it's better to change the start than the length. Though, given bug 2, perhaps the following?
Note - this can also be an issue when elements are loaded in order - consider looping through data with time indices 1:10 with backend length 4, the backend evolution will look like:
|
Huh. I'm a little confused though because I thought that by prescription
Can we fix this by explicitly ensuring that both indices are in memory before trying to interpolate? Another consideration is that any behavior needs to work for |
Yes, the comment says this though so I guess that's what it's doing?
Do you think calling
Good point, the behaviour should stay the same at the endpoint for
|
bug number 1 is specifical to linearly time-indexed fts, so I think we are just missing a method for linear time indexing. For example, with Oceananigans.jl/src/OutputReaders/field_time_series.jl Lines 187 to 191 in 2008559
I think a solution is capping the time-index to Nt for linear time indexing. For example: @inline function time_index(backend::PartlyInMemory, ::Linear, Nt, m)
n = reverse_index(m, backend.start)
ñ = ifelse(n > Nt, Nt, n)
return ñ
end I can open a PR for this |
Weirdly enough I cannot reproduce bug 2: julia> for i in [0,0.39,0.4,0.41,0.42,0.43]
println("\nTime = $i")
println(maximum(f_fts[Time(i)]))
println(f_fts.backend)
end
Time = 0.0
0.0
InMemory{Int64}(1, 5)
Time = 0.39
0.7276088681589021
InMemory{Int64}(1, 5)
Time = 0.4
0.7431448254773941
InMemory{Int64}(1, 5)
Time = 0.41
0.9425534050440822
InMemory{Int64}(6, 5)
Time = 0.42
0.7677209411388031
InMemory{Int64}(5, 5)
Time = 0.43
0.7800089989695074
InMemory{Int64}(5, 5) |
As a sidenote, probably Also, I think the interpolation for |
You're right that this is the expected output, sorry about that! I've updated the original issue. But it still illustrates the same error, which is that the value at time 0.41 is wrong. It should be |
I've come across a few bugs when reading in a FieldTimeSeries from disk with a partly in memory backend. I'll show them here with a MWE and comment below some suggestions for fixing them, I'm happy to make a PR if the changes seem sensible but haven't contributed before so will start here!
Bug 1:
We can't access elements from
Nt - N_in_mem + 2
toNt
unless we first access a previous element that loads it in memory.Code:
Output:
Bug 2:
When linearly interpolating in time from a partly in memory
FieldTimeSeries
and going out of range of what's in memory, the update backend logic doesn't seem right. Suppose we're interpolating between time indices 5 and 6 withlength(fts.backend) = 5
. Currently, only indices (1,2,3,4,5) are in memory (that is,backend = InMemory(1,5)
). Then for the interpolation (line 146 offield_time_series_indexing.jl
)fts[5]
is called first (no new backend needed) beforefts[6]
, which requires a new backend. This new backend isInMemory(6,5)
which is a) not efficient as we'll probably need to go back toInMemory(5,5)
for the next timestep and b) causes an update of what's in memory mid-computation, which gives a wrong value (see time 0.41 of output below).Code:
Output:
Bug 3:
The operator ≈ sets a relative tolerance that isn't good enough for values near zero (caused errors for me in line 251
field_time_series.jl
and line 10set_field_time_series.jl
)Code:
Environment: Oceananigans.jl
main
branch.The text was updated successfully, but these errors were encountered: