Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs when indexing a partly in memory FieldTimeSeries #4077

Open
loisbaker opened this issue Feb 3, 2025 · 12 comments
Open

Bugs when indexing a partly in memory FieldTimeSeries #4077

loisbaker opened this issue Feb 3, 2025 · 12 comments

Comments

@loisbaker
Copy link

loisbaker commented Feb 3, 2025

I've come across a few bugs when reading in a FieldTimeSeries from disk with a partly in memory backend. I'll show them here with a MWE and comment below some suggestions for fixing them, I'm happy to make a PR if the changes seem sensible but haven't contributed before so will start here!

using Oceananigans
using Oceananigans.Utils: Time 

# First we generate some test data and write to disk
grid = RectilinearGrid(size=(2, 2, 2), extent=(1, 1, 1))
times = 0:0.1:3
filename = "MWE_data_file.jld2"
f_tmp = Field{Center,Center,Center}(grid) 
f = FieldTimeSeries{Center, Center, Center}(grid, times; backend = OnDisk(), path = filename, name = "f")

for (it, time) in enumerate(f.times)
    set!(f_tmp, (x, y, z) -> sin(2* π * time /3) )
    set!(f,f_tmp,it)
end

# Now we load the FTS partly in memory
N_in_mem = 5
f_fts = FieldTimeSeries(filename, "f"; backend = InMemory(N_in_mem))
Nt = length(f_fts.times)

Bug 1:
We can't access elements from Nt - N_in_mem + 2 to Nt unless we first access a previous element that loads it in memory.

Code:

function try_accessing_element(FTS,n)
    println("Trying to access element $n")
    try
        println(FTS[n])
    catch e
        println(e)
    end
end

try_accessing_element(f_fts,Nt-N_in_mem +2 ) 
try_accessing_element(f_fts,Nt-N_in_mem +1 ) 
try_accessing_element(f_fts,Nt-N_in_mem +2 ) 

Output:

Trying to access element 28
BoundsError(0.0:0.1:3.0, (32,))
Trying to access element 27
2×2×2 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 2×2×2 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 2×2×2 halo
├── boundary conditions: Nothing 
└── data: 6×6×6 OffsetArray(view(::Array{Float64, 4}, :, :, :, 1), -1:4, -1:4, -1:4) with eltype Float64 with indices -1:4×-1:4×-1:4
    └── max=-0.743145, min=-0.743145, mean=-0.743145
Trying to access element 28
2×2×2 Field{Center, Center, Center} on RectilinearGrid on CPU
├── grid: 2×2×2 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 2×2×2 halo
├── boundary conditions: Nothing 
└── data: 6×6×6 OffsetArray(view(::Array{Float64, 4}, :, :, :, 2), -1:4, -1:4, -1:4) with eltype Float64 with indices -1:4×-1:4×-1:4
    └── max=-0.587785, min=-0.587785, mean=-0.587785

Bug 2:
When linearly interpolating in time from a partly in memory FieldTimeSeries and going out of range of what's in memory, the update backend logic doesn't seem right. Suppose we're interpolating between time indices 5 and 6 with length(fts.backend) = 5 . Currently, only indices (1,2,3,4,5) are in memory (that is, backend = InMemory(1,5)). Then for the interpolation (line 146 of field_time_series_indexing.jl) fts[5] is called first (no new backend needed) before fts[6], which requires a new backend. This new backend is InMemory(6,5) which is a) not efficient as we'll probably need to go back to InMemory(5,5) for the next timestep and b) causes an update of what's in memory mid-computation, which gives a wrong value (see time 0.41 of output below).

Code:

for i in [0,0.39,0.4,0.41,0.42,0.43]
    println("\nTime = $i")
    println(maximum(f_fts[Time(i)]))
    println(f_fts.backend)
end

Output:

Time = 0.0
0.0
InMemory{Int64}(1, 5)

Time = 0.39
0.7276088681589021
InMemory{Int64}(1, 5)

Time = 0.4
0.7431448254773941
InMemory{Int64}(1, 5)

Time = 0.41
0.9425534050440822 # Wrong value, should be 0.7569950556517563
InMemory{Int64}(6, 5)

Time = 0.42
0.7677209411388031
InMemory{Int64}(5, 5)

Time = 0.43
0.7800089989695074
InMemory{Int64}(5, 5)

Bug 3:
The operator ≈ sets a relative tolerance that isn't good enough for values near zero (caused errors for me in line 251 field_time_series.jl and line 10 set_field_time_series.jl)
Code:

time_range = range(0, 1, length=11)
time_arr = [0:0.1:1;]
time_arr[1] = 1e-16 
println(all(time_range .≈ time_arr)) #false
println(isapprox(time_arr,time_range)) #true

Environment: Oceananigans.jl main branch.

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Core(TM) i9-14900K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
@loisbaker
Copy link
Author

Bug 1: Suggested fix
In OutputReaders/field_time_series_indexing.jl, line 260 update_field_time_series!(fts::PartlyInMemoryFTS, n₁::Int, n₂=n₁) sets the new backend for the FTS when the required time index is out of range. It tries to make a backend of the same length as before, which causes a BoundsError when this takes the backend out of the time range of the data. This function could set

Nm = ifelse(start + length(fts.backend) - 1 > Nt , Nt-start+1, length(fts.backend))
fts.backend = new_backend(fts.backend, start, Nm)

rather than

Nm = length(fts.backend)
fts.backend = new_backend(fts.backend, start, Nm)

(although this does permanently change the backend length, not sure if this is an issue though as we're at the end of the data)

@loisbaker
Copy link
Author

Bug 2: Suggested fix
Similarly, the update of the backend in OutputReaders/field_time_series_indexing.jl, line 260 update_field_time_series!(fts::PartlyInMemoryFTS, n₁::Int, n₂=n₁) chooses the next start index to be the index n that's currently required. Since time usually goes forwards, the current interpolation is probably between indices n-1 and n, so the next start index should probably be n-1 (unless n=1). This would mean changing start = n₁ to start = ifelse(n₁>1,n₁-1,n₁).

There's also the problem that the backend changes mid computation in Base.getindex(fts::FieldTimeSeries, time_index::Time) (line 137) of OutputReaders/field_time_series_indexing.jl. If we were to change

# Otherwise, make a Field representing a linear interpolation in time
    ψ₁ = fts[n₁]
    ψ₂ = fts[n₂]
    ψ̃ = Field(ψ₂ * ñ + ψ₁ * (1 - ñ))

to

# Otherwise, make a Field representing a linear interpolation in time
    ψ₂ = fts[n₂]
    ψ₁ = fts[n₁]
    ψ̃ = Field(ψ₂ * ñ + ψ₁ * (1 - ñ))

then (together with the previous change) the backend would update to include both n₂ and n₁ during ψ₂ = fts[n₂], so ψ₁ = fts[n₁] wouldn't update the backend again, and both would be computed with the same backend.

@loisbaker
Copy link
Author

Bug 3: Suggested fix
This causes an issue in two places for me.

On line 10 of set_field_time_series.jl, replace

findfirst(t -> t  time, file_times)

with

findfirst(t -> (isapprox(t,time) || isapprox(t,time,atol=1e-14)), file_times)

(though that's a bit ugly and arbitrary)

and on line 251 of `field_time_series.jl'

if all(time_range .≈ times) # good enough for most

could be replaced by

if isapprox(time_range,times)

@glwagner
Copy link
Member

glwagner commented Feb 4, 2025

Thank you this is very helpful! PRs are very welcome. If possible, 1 PR at a time is best so we can hash out the solutions independently.

I want to discuss just one bug at a time for clarity so let's start with bug 1. I was able to reproduce the error by writing:

julia> f_fts = FieldTimeSeries(filename, "f"; backend = InMemory(5))
2×2×2×31 FieldTimeSeries{InMemory} located at (Center, Center, Center) of f at MWE_data_file.jld2
├── grid: 2×2×2 RectilinearGrid{Float64, Periodic, Periodic, Bounded} on CPU with 2×2×2 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory(1, 5)
├── path: MWE_data_file.jld2
├── name: f
└── data: 6×6×6×5 OffsetArray(::Array{Float64, 4}, -1:4, -1:4, -1:4, 1:5) with eltype Float64 with indices -1:4×-1:4×-1:4×1:5
    └── max=0.743145, min=0.0, mean=0.0144117

julia> f_fts[28]
ERROR: BoundsError: attempt to access 31-element StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64} at index [32]
Stacktrace:
 [1] throw_boundserror(A::StepRangeLen{Float64, Base.TwicePrecision{…}, Base.TwicePrecision{…}, Int64}, I::Tuple{Int64})
   @ Base ./abstractarray.jl:737
 [2] checkbounds
   @ ./abstractarray.jl:702 [inlined]
 [3] getindex
   @ ./range.jl:955 [inlined]
 [4] set!(fts::FieldTimeSeries{…}, path::String, name::String; warn_missing_data::Bool)
   @ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/set_field_time_series.jl:25
 [5] set!(fts::FieldTimeSeries{…}, path::String, name::String)
   @ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/set_field_time_series.jl:13
 [6] update_field_time_series!(fts::FieldTimeSeries{…}, n₁::Int64, n₂::Int64)
   @ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:269
 [7] update_field_time_series!(fts::FieldTimeSeries{…}, n₁::Int64)
   @ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:261
 [8] getindex(fts::FieldTimeSeries{…}, n::Int64)
   @ Oceananigans.OutputReaders ~/Projects/dev/Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl:278
 [9] top-level scope
   @ REPL[44]:1
Some type information was truncated. Use `show(err)` to see complete types.

I think I see the issue that you're pointing out. Basically there is an implicit assumption that the elements are loaded in order. In that case, no error would occur because after calling f_fts[27] we have all data up to n=31 in memory:

julia> f_fts.backend
InMemory{Int64}(27, 5)

so f_fts[31] works.

My opinion is that it might be more logical if the backend "length" always corresponds to the number of times in memory. To implement this I think we can change this line:

to

start = min(n₁, backend.length - Nm + 1)

Does this solution work @loisbaker ? Also ccing @simone-silvestri here

@loisbaker
Copy link
Author

loisbaker commented Feb 4, 2025

My opinion is that it might be more logical if the backend "length" always corresponds to the number of times in memory. To implement this I think we can change this line:

Oceananigans.jl/src/OutputReaders/field_time_series_indexing.jl

Line 267 in a8cf967

start = n₁
to

start = min(n₁, backend.length - Nm + 1)
Does this solution work @loisbaker ? Also ccing @simone-silvestri here

Thanks! Good point, it's better to change the start than the length. Though, given bug 2, perhaps the following?

Nt = length(fts.times)
Nm = length(fts.backend)
start = min(max(n₁-1,1), Nt - Nm + 1)

Note - this can also be an issue when elements are loaded in order - consider looping through data with time indices 1:10 with backend length 4, the backend evolution will look like:

InMemory(1,4)
InMemory(5,4)
InMemory(9,4) <- BoundsError

@glwagner
Copy link
Member

glwagner commented Feb 4, 2025

Huh. I'm a little confused though because I thought that by prescription n₁ is supposed to be "before" the intended time being interpolated.

Currently, only indices (1,2,3,4,5) are in memory (that is, backend = InMemory(1,5)). Then for the interpolation (line 146 of field_time_series_indexing.jl) fts[5] is called first (no new backend needed) before fts[6], which requires a new backend.

Can we fix this by explicitly ensuring that both indices are in memory before trying to interpolate?

Another consideration is that any behavior needs to work for Cyclical time indexing (eg "boundary conditions" for the interpolation). I think the default (and MWE) are using Linear.

@loisbaker
Copy link
Author

Huh. I'm a little confused though because I thought that by prescription n₁ is supposed to be "before" the intended time being interpolated.

Yes, the comment says this though so I guess that's what it's doing?

# If `n` is not in memory, getindex automatically updates the data in memory
# so that `n` is the first index available.

Can we fix this by explicitly ensuring that both indices are in memory before trying to interpolate?

Do you think calling ψ₂ = fts[n₂] first (together with setting n₂-1 as the backend start index) would always ensure this?

Another consideration is that any behavior needs to work for Cyclical time indexing (eg "boundary conditions" for the interpolation). I think the default (and MWE) are using Linear.

Good point, the behaviour should stay the same at the endpoint for Cyclical. How about the following?

# Update backend
Nt = length(fts.times)
Nm = length(fts.backend)
start = ifelse(fts.time_indexing isa Cyclical, n₁-1, min(max(n₁-1,1), Nt - Nm + 1))
fts.backend = new_backend(fts.backend, start, Nm)

@simone-silvestri
Copy link
Collaborator

bug number 1 is specifical to linearly time-indexed fts, so I think we are just missing a method for linear time indexing. For example, with time_indexing = Cyclical the bug does not appear because we have a specific method that wraps around the index:

@inline function time_index(backend::PartlyInMemory, ::Cyclical, Nt, m)
n = reverse_index(m, backend.start)
= mod1(n, Nt) # wrap index
return
end

I think a solution is capping the time-index to Nt for linear time indexing. For example:

@inline function time_index(backend::PartlyInMemory, ::Linear, Nt, m)
    n = reverse_index(m, backend.start)
    ñ = ifelse(n > Nt, Nt, n)
    returnend

I can open a PR for this

@simone-silvestri
Copy link
Collaborator

Weirdly enough I cannot reproduce bug 2:

julia> for i in [0,0.39,0.4,0.41,0.42,0.43]
                  println("\nTime = $i")
                  println(maximum(f_fts[Time(i)]))
                  println(f_fts.backend)
              end

Time = 0.0
0.0
InMemory{Int64}(1, 5)

Time = 0.39
0.7276088681589021
InMemory{Int64}(1, 5)

Time = 0.4
0.7431448254773941
InMemory{Int64}(1, 5)

Time = 0.41
0.9425534050440822
InMemory{Int64}(6, 5)

Time = 0.42
0.7677209411388031
InMemory{Int64}(5, 5)

Time = 0.43
0.7800089989695074
InMemory{Int64}(5, 5)

@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Feb 9, 2025

As a sidenote, probably Linear is not the best default to have because if extrapolated on long times it might return very wrong values. Clamp or Cyclical are maybe best defaults to have. Probably Clamp is the one with the least assumptions.

Also, I think the interpolation for Linear is wrong at the moment if time > times[end]. In that case we need to perform linear extrapolation, which is not the case (I will try to verify it)

@loisbaker
Copy link
Author

Weirdly enough I cannot reproduce bug 2:

You're right that this is the expected output, sorry about that! I've updated the original issue. But it still illustrates the same error, which is that the value at time 0.41 is wrong. It should be sin (2*pi*0.41 /3) = 0.7569950557

@simone-silvestri
Copy link
Collaborator

Ok I see now, If I do

julia> s = []
Any[]

julia> for i in 0:0.01:10
              push!(s, f_fts[Time(i)][1, 1, 1])
           end

julia> lines(s)

I get this:

Image

So indeed, there is a problem with interpolating. However, linear extrapolation seems to be correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants