-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow scientific notation in written numbers #70
Conversation
Is this too slow?
Let us benchmark. BenchmarkTry every Float16 50 times, which is 1.5 million numbers function all_floats(; n=50)
out = Char[]
sizehint!(out, 30720 * n)
for _ = 1:n
x = floatmin(Float16)
while x <= floatmax(Float16)
str = EDF._edf_repr(Float64(x))
push!(out, str[1])
x = nextfloat(x)
end
end
return out
end PRjulia> @time length(all_floats())
0.731282 seconds (9.98 M allocations: 1.442 GiB, 7.50% gc time)
1536000
julia> @time length(all_floats())
0.727198 seconds (9.98 M allocations: 1.442 GiB, 7.72% gc time)
1536000
julia> @time length(all_floats())
0.736517 seconds (9.98 M allocations: 1.442 GiB, 7.93% gc time)
1536000 Mainjulia> @time length(all_floats())
0.344242 seconds (12.52 M allocations: 889.876 MiB, 6.08% gc time)
1536000
julia> @time length(all_floats())
0.345693 seconds (12.52 M allocations: 889.876 MiB, 5.95% gc time)
1536000
julia> @time length(all_floats())
0.347998 seconds (12.52 M allocations: 889.876 MiB, 6.80% gc time)
1536000 ConclusionSo it is ~2x slower, but still fast enough that it doesn't seem like a big issue. |
Is allowing underflow to 0 dangerous?
Therefore I think it is probably OK, given the we can see all the usages and check them individually. |
Is scientific notation even allowed in the EDF+ spec?from https://www.edfplus.info/specs/guidelines.html#:~:text=Numbers%20in%20EDF(%2B)%20headers,3%20(Okt%202004) they say
So it seems it is. We do not use it for non-header fields such as annotation durations and onsets. |
src/write.jl
Outdated
# `allow_scientific` is only meaningful for `value::Number`. We allow passing it though, | ||
# so `edf_write` can be more generic. | ||
_edf_repr(value::Union{String,Char}; allow_scientific=nothing) = value | ||
_edf_repr(date::Date; allow_scientific=nothing) = uppercase(Dates.format(date, dateformat"dd-u-yyyy")) | ||
_edf_repr(date::DateTime; allow_scientific=nothing) = Dates.format(date, dateformat"dd\.mm\.yyHH\.MM\.SS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually keep the existing methods as they are and simply add a dispatch layer like so:
_edf_repr(value::Number, allow_scientific::Bool) = # implementation
_edf_repr(value, ::Any) = _edf_repr(value)
That way the methods that don't care about scientific notation don't need to include it in their signatures.
src/write.jl
Outdated
_edf_repr(date::DateTime; allow_scientific=nothing) = Dates.format(date, dateformat"dd\.mm\.yyHH\.MM\.SS") | ||
|
||
""" | ||
sprintf_G_under_8(x) -> String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should be all lowercase per the style guide, though it's generally rather odd; perhaps there's something more descriptive of what it does rather than how it's implemented?
Whether it's mentioend in the spec and whether it's supported by other readers is a different thing. Maybe we don't care about compatibility with say MNE and friends but we should at least try scientific notation exports and see if our preferred viewers can view the generated files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not opposed to this if we can check whether or not we're generating un-readable output, both for standard viewers (persyst and EDFBrowser) and for the python tools (MNE).
# It is similiar to a `EDF Annotations` file except that | ||
# It is similiar to a `EDF Annotations` file except that | ||
# The `ANNOTATIONS_SIGNAL_LABEL` is `BDF Annotations`. | ||
# The test data has 1081 trigger events, and | ||
# has 180 trials in total, and | ||
# The test data has 1081 trigger events, and | ||
# has 180 trials in total, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert these whitesapce changes to keep blame clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've configured vscode to remove trailing whitespace, so I will need to do this at the end (otherwise if I touch the file again it will re-introduce it). So your comment is noted but I won't address it until the PR is ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no autoformatting set up on this repo? it's possible given that it's kind of ancient...
One thing I realized is that in TAL, there isn't the same 8 character limits (as far as I can tell in the spec). And that is the only place we currently force the decimal printing with floating-point numbers. So we could try raising the limit there (or just do a regular |
This actually does fix my internal bug, which now has an MWE at #74, and results in EDFs that are loadable in EDFBrowser at least (and which pass it's validation check) but I don't think this approach is actually necessary because we should just allow rounding to 0 for fields like |
Update to #30: attempts to allow writing out scientific notation where legal (i.e. not onset or duration fields).
motivated by encountering
when writing out an EDF. We don't know why or which field this was in though.
This PR changes the semantics:
allow_scientific=false
, which I have set by default, then underflow to 0 is allowed. This was formerly disallowed in [Bugfix] add option to truncate values with size larger than spec #30 (although I did not see a discussion why). The former semantics were a bit weird in that rounding/truncation was OK except when the result was truncated to 0, in which case it errored.allow_scientific=true
, then we use%G
in sprintf to allow either of scientific notation or decimal representation. This shouldn't really underflow nor with any kind of Float64 representable number, but could if somehow one got a BigFloat or such in there.I also set the code to allow scientific notation with all floating-point fields of the headers (but with onset/durations of annotations).
Would appreciate thoughts on this & careful review