Allow scientific notation in written numbers #70

ericphanson · 2023-10-13T13:48:44Z

Update to #30: attempts to allow writing out scientific notation where legal (i.e. not onset or duration fields).

motivated by encountering

ErrorException(\"failed to fit number into EDF's 8 ASCII character limit: 4.180821e-7\")

when writing out an EDF. We don't know why or which field this was in though.

This PR changes the semantics:

if allow_scientific=false, which I have set by default, then underflow to 0 is allowed. This was formerly disallowed in [Bugfix] add option to truncate values with size larger than spec #30 (although I did not see a discussion why). The former semantics were a bit weird in that rounding/truncation was OK except when the result was truncated to 0, in which case it errored.
if allow_scientific=true, then we use %G in sprintf to allow either of scientific notation or decimal representation. This shouldn't really underflow nor with any kind of Float64 representable number, but could if somehow one got a BigFloat or such in there.

I also set the code to allow scientific notation with all floating-point fields of the headers (but with onset/durations of annotations).

Would appreciate thoughts on this & careful review

ericphanson · 2023-10-13T14:18:05Z

Is this too slow?

the allow_scientific=true path is only used for headers, so should not be a perf concern
the allow_scientific=false path is used for annotations, so could be a perf concern, if it is slow to call for several million numbers

Let us benchmark.

Benchmark

Try every Float16 50 times, which is 1.5 million numbers

function all_floats(; n=50)
    out = Char[]
    sizehint!(out, 30720 * n)
    for _ = 1:n
        x = floatmin(Float16)
        while x <= floatmax(Float16)
            str = EDF._edf_repr(Float64(x))
            push!(out, str[1])
            x = nextfloat(x)
        end
    end
    return out
end

PR

julia> @time length(all_floats())
  0.731282 seconds (9.98 M allocations: 1.442 GiB, 7.50% gc time)
1536000

julia> @time length(all_floats())
  0.727198 seconds (9.98 M allocations: 1.442 GiB, 7.72% gc time)
1536000

julia> @time length(all_floats())
  0.736517 seconds (9.98 M allocations: 1.442 GiB, 7.93% gc time)
1536000

Main

julia> @time length(all_floats())
  0.344242 seconds (12.52 M allocations: 889.876 MiB, 6.08% gc time)
1536000

julia> @time length(all_floats())
  0.345693 seconds (12.52 M allocations: 889.876 MiB, 5.95% gc time)
1536000

julia> @time length(all_floats())
  0.347998 seconds (12.52 M allocations: 889.876 MiB, 6.80% gc time)
1536000

Conclusion

So it is ~2x slower, but still fast enough that it doesn't seem like a big issue.

ericphanson · 2023-10-13T14:51:16Z

Is allowing underflow to 0 dangerous?

We allow this now when allow_scientific=false. This is used for onsets & annotations, and for non-floating-point types.
- for onsets and annotations, a sufficeintly small time like 1e-7 does seem OK to round down to 0. If the onset is right at the start of the recording, marking that as time 0 seems OK. If an annotation has very small duration, since EDF does allow 0 or even missing duration, rounding that down to 0 does seem OK too.
when allow_scientific=true, we can represent even floatmin(Float64) with a few digits of precision. This is used for floating point fields in the headers.

Therefore I think it is probably OK, given the we can see all the usages and check them individually.

ericphanson · 2023-10-13T14:52:15Z

Is scientific notation even allowed in the EDF+ spec?

from https://www.edfplus.info/specs/guidelines.html#:~:text=Numbers%20in%20EDF(%2B)%20headers,3%20(Okt%202004) they say

2a, E-notation (May 2009). Numbers in EDF(+) headers may have the scientific E notation as in 1E2345, +012E+34, -1.34E09 and +1.234E-5. Note, though, that the 8 characters of some EDF(+) numberfields are used more efficiently by -123.456 uV than by -1.23E-4 V.

So it seems it is. We do not use it for non-header fields such as annotation durations and onsets.

ararslan · 2023-10-13T16:04:56Z

src/write.jl

+# `allow_scientific` is only meaningful for `value::Number`. We allow passing it though,
+# so `edf_write` can be more generic.
+_edf_repr(value::Union{String,Char}; allow_scientific=nothing) = value
+_edf_repr(date::Date; allow_scientific=nothing) = uppercase(Dates.format(date, dateformat"dd-u-yyyy"))
+_edf_repr(date::DateTime; allow_scientific=nothing) = Dates.format(date, dateformat"dd\.mm\.yyHH\.MM\.SS")


I would actually keep the existing methods as they are and simply add a dispatch layer like so:

_edf_repr(value::Number, allow_scientific::Bool) = # implementation _edf_repr(value, ::Any) = _edf_repr(value)

That way the methods that don't care about scientific notation don't need to include it in their signatures.

ararslan · 2023-10-13T16:13:22Z

src/write.jl

+_edf_repr(date::DateTime; allow_scientific=nothing) = Dates.format(date, dateformat"dd\.mm\.yyHH\.MM\.SS")
+
+"""
+    sprintf_G_under_8(x) -> String


The name should be all lowercase per the style guide, though it's generally rather odd; perhaps there's something more descriptive of what it does rather than how it's implemented?

src/write.jl

kleinschmidt · 2023-10-13T17:40:37Z

Is scientific notation even allowed in the EDF+ spec?

from https://www.edfplus.info/specs/guidelines.html#:~:text=Numbers%20in%20EDF(%2B)%20headers,3%20(Okt%202004) they say

2a, E-notation (May 2009). Numbers in EDF(+) headers may have the scientific E notation as in 1E2345, +012E+34, -1.34E09 and +1.234E-5. Note, though, that the 8 characters of some EDF(+) numberfields are used more efficiently by -123.456 uV than by -1.23E-4 V.

So it seems it is. We do not use it for non-header fields such as annotation durations and onsets.

Whether it's mentioend in the spec and whether it's supported by other readers is a different thing. Maybe we don't care about compatibility with say MNE and friends but we should at least try scientific notation exports and see if our preferred viewers can view the generated files

kleinschmidt

I'm generally not opposed to this if we can check whether or not we're generating un-readable output, both for standard viewers (persyst and EDFBrowser) and for the python tools (MNE).

src/types.jl

test/runtests.jl

kleinschmidt · 2023-10-13T21:46:38Z

test/runtests.jl

-    # It is similiar to a `EDF Annotations` file except that 
+    # It is similiar to a `EDF Annotations` file except that
    # The `ANNOTATIONS_SIGNAL_LABEL` is `BDF Annotations`.
-    # The test data has 1081 trigger events, and 
-    # has 180 trials in total, and 
+    # The test data has 1081 trigger events, and
+    # has 180 trials in total, and


revert these whitesapce changes to keep blame clean

I've configured vscode to remove trailing whitespace, so I will need to do this at the end (otherwise if I touch the file again it will re-introduce it). So your comment is noted but I won't address it until the PR is ready to merge.

is there no autoformatting set up on this repo? it's possible given that it's kind of ancient...

ericphanson · 2023-10-16T10:51:41Z

One thing I realized is that in TAL, there isn't the same 8 character limits (as far as I can tell in the spec). And that is the only place we currently force the decimal printing with floating-point numbers. So we could try raising the limit there (or just do a regular @sprintf("%f", x)).

ericphanson · 2023-10-17T16:01:22Z

This actually does fix my internal bug, which now has an MWE at #74, and results in EDFs that are loadable in EDFBrowser at least (and which pass it's validation check) but I don't think this approach is actually necessary because we should just allow rounding to 0 for fields like physical_min. So I think a different fix would be more direct here as well as less concerning about portability.

ericphanson added 4 commits October 13, 2023 15:36

allow scientific notation

a9caf8c

add test for failing case

96dd042

add test

3147ee8

bump julia requirement to LTS

ff6fc63

ericphanson requested review from kleinschmidt and ararslan October 13, 2023 13:55

ararslan reviewed Oct 13, 2023

View reviewed changes

kleinschmidt reviewed Oct 13, 2023

View reviewed changes

ericphanson mentioned this pull request Oct 16, 2023

add Printf.Format and Printf.format to manual JuliaLang/julia#51723

Merged

ericphanson added 2 commits October 16, 2023 12:44

respond to review feedback

769fcda

test logs

4cb9911

ericphanson mentioned this pull request Oct 16, 2023

Write onsets and durations with more decimal digits #72

Open

ericphanson closed this Oct 17, 2023

This was referenced Oct 18, 2023

OndaEDF should use EDF-representable encoding parameters (physical min & max) beacon-biosignals/OndaEDF.jl#90

Open

RFC: More principled errors #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow scientific notation in written numbers #70

Allow scientific notation in written numbers #70

ericphanson commented Oct 13, 2023

ericphanson commented Oct 13, 2023 •

edited

Loading

ericphanson commented Oct 13, 2023

ericphanson commented Oct 13, 2023

ararslan Oct 13, 2023

ararslan Oct 13, 2023

kleinschmidt commented Oct 13, 2023

Is scientific notation even allowed in the EDF+ spec?

kleinschmidt left a comment

kleinschmidt Oct 13, 2023

ericphanson Oct 16, 2023

kleinschmidt Oct 16, 2023

ericphanson commented Oct 16, 2023

ericphanson commented Oct 17, 2023

Allow scientific notation in written numbers #70

Allow scientific notation in written numbers #70

Conversation

ericphanson commented Oct 13, 2023

ericphanson commented Oct 13, 2023 • edited Loading

Is this too slow?

Benchmark

PR

Main

Conclusion

ericphanson commented Oct 13, 2023

Is allowing underflow to 0 dangerous?

ericphanson commented Oct 13, 2023

Is scientific notation even allowed in the EDF+ spec?

ararslan Oct 13, 2023

Choose a reason for hiding this comment

ararslan Oct 13, 2023

Choose a reason for hiding this comment

kleinschmidt commented Oct 13, 2023

Is scientific notation even allowed in the EDF+ spec?

kleinschmidt left a comment

Choose a reason for hiding this comment

kleinschmidt Oct 13, 2023

Choose a reason for hiding this comment

ericphanson Oct 16, 2023

Choose a reason for hiding this comment

kleinschmidt Oct 16, 2023

Choose a reason for hiding this comment

ericphanson commented Oct 16, 2023

ericphanson commented Oct 17, 2023

ericphanson commented Oct 13, 2023 •

edited

Loading