Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better sin_sum #88

Merged
merged 3 commits into from
Apr 11, 2023
Merged

better sin_sum #88

merged 3 commits into from
Apr 11, 2023

Conversation

oscardssmith
Copy link
Member

This makes sin_sum work for summing an arbitrary number of elements and retain accuracy for large inputs (>1e15), and uses this improvement to remove the edge cases in bessely and besselj. Performance and accuracy information to come.

@heltonmc
Copy link
Member

heltonmc commented Apr 7, 2023

Probably also worth switching out

# we need to calculate sin(x - v*pi/2 - pi/4) and cos(x - v*pi/2 - pi/4)
# For improved accuracy this is expanded using the formula for sin(x+y+z)
s, c = sincos(PIO2(S) * v)
sα, cα = sincos(α)
CMS = c - s
CPS = c + s
s1 = CMS *
s2 = CPS *
s3 = CMS *
s4 = CPS *

I haven't tested thoroughly if that section is the cause or not but this is related to #37

@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Patch coverage: 64.06% and project coverage change: -1.18 ⚠️

Comparison is base (3276cf0) 97.30% compared to head (daf86ce) 96.12%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
- Coverage   97.30%   96.12%   -1.18%     
==========================================
  Files          20       20              
  Lines        2297     2299       +2     
==========================================
- Hits         2235     2210      -25     
- Misses         62       89      +27     
Impacted Files Coverage Δ
src/misc.jl 55.73% <57.40%> (-44.27%) ⬇️
src/besselj.jl 98.31% <100.00%> (-0.11%) ⬇️
src/bessely.jl 98.03% <100.00%> (-0.11%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@oscardssmith
Copy link
Member Author

To do that properly I would have to restructure this a little bit to make it possible to get the sin and cos sum simultaneously.

@heltonmc
Copy link
Member

heltonmc commented Apr 7, 2023

Ya I think the error is coming before that anyway but it's been a while since I've looked at that function. This is great on my tests the accuracy was similar and was faster.

I do think this function is generally useful enough to be in base because so many people naively will compute sin(x + y) which loses lots of precision.... but I'm obviously bias 😄

@oscardssmith
Copy link
Member Author

I understand where you're coming from, but the problem of errors not being composable isn't really solvable by adding random functions to base.

@heltonmc
Copy link
Member

heltonmc commented Apr 8, 2023

Ps. Totally fine to merge whenever you are ready!

@oscardssmith
Copy link
Member Author

I want to do some benchmarks on the 24 to 1e15 region first to make sure I haven't regressed anything.

@oscardssmith
Copy link
Member Author

I also think I can get rid of the two separate inputs which would be a bit cleaner.

@oscardssmith
Copy link
Member Author

I've done the benchmarking and cleanup that I wanted, so I think this is now good to go. The only thing I'm not 100% happy with in this version is that sin_sum of Float32s will now lose a little accuracy since it accumulates into a Float64 which can incur some inaccuracy since floatmax(Float32) is 3.4f38 (but doing the reduction this way is ~20x faster than the method this uses for Float64.

@oscardssmith
Copy link
Member Author

OK, I'm finally relatively happy with the Float32 version of the reduction. It still can lose some precision, but it's a lot better than before while maintaining relatively decent speed when you don't have large arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the Float32 implementations still use p = p * cos(xn + x). Should this be switched over to the better sum formula now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. it will be a bit slower, so it probably needs a benchmark and accuracy plot to determine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check it out tomorrow ! This weekend was busy with family but have time to look tomorrow more carefully

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeesh - the besselj0 implementation was some of my first julia code which has been improved but the tests aren't very robust.

julia> Bessels.besselj0(328049.34f0)
-0.0013240778f0

julia> Bessels.besselj0(Float64(328049.34f0))
-0.0013258623831875669

I'm imagining the better sin sum here could help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I posted an issue at #90 to track that in a separate PR.

@heltonmc
Copy link
Member

heltonmc commented Apr 11, 2023

@oscardssmith I was fixing the Float32 case (which drastically improves accuracy) but was noticing that the garbage collector was running.
For Float64 case.

# master
julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
 Range (min  max):  28.635 ns   1.378 μs  ┊ GC (min  max): 0.00%  94.85%
 Time  (median):     65.963 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   60.570 ns ± 44.076 ns  ┊ GC (mean ± σ):  2.31% ±  3.27%

  ▇    ▄                          ▂▇█▅▂          ▁▁▁      ▁▁▁ ▂
  █▁▆▄▁█▁▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█████▇▆▃▃▁▁▄▁▁▇███▇▆▆▄▅████ █
  28.6 ns      Histogram: log(frequency) by time      93.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Then the Float32 case I was actually getting allocations.

# on my test branch
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
 Range (min  max):  60.338 ns   1.372 μs  ┊ GC (min  max): 0.00%  95.17%
 Time  (median):     63.087 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   66.210 ns ± 48.384 ns  ┊ GC (mean ± σ):  3.00% ±  3.91%

  ▁▄▅▇██▇▆▄▂▁                      ▁▁                   ▁▁▁   ▂
  ████████████▇▅▃▄▁▁▁▁▁▃▁▁▁▁▁▁▃▃▁▅█████▇▆▆▄▅▅▅▄▃▁▁▁▅▆▆██████▇ █
  60.3 ns      Histogram: log(frequency) by time      90.4 ns <

 Memory estimate: 80 bytes, allocs estimate: 4.

The benchmarks on tagged version

# Bessels v0.2.8
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min  max):  15.738 ns  28.151 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     16.074 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.054 ns ±  0.319 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▂▄▂                ▁▆█▄               
  ▁▁▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▃▅███▅▃▁▁▁▁▁▁▂▂▂▃▄▅▆▇████▆▃▂▂▃▄▅▆▆▆▄▃▂ ▃
  15.7 ns         Histogram: frequency by time        16.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min  max):  16.520 ns  32.577 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     16.916 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.940 ns ±  1.976 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅ ▂    ▁                                    ▆               
  ██▂█▇▁▁▁█▆▂▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▃▂▁▃ ▃
  16.5 ns         Histogram: frequency by time        22.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Any idea what's going on?

@oscardssmith
Copy link
Member Author

Presumably you introduced a type instability somewhere.

@heltonmc
Copy link
Member

Ill look closer but I get the allocation pulling out the rem_pio2_sum

using Base.Math: sin_kernel, cos_kernel, sincos_kernel, rem_pio2_kernel, DoubleFloat64, DoubleFloat32
function rem_pio2_sum(xs::Vararg{Float64})
    n = 0
    hi, lo = 0.0, 0.0
    for x in xs
        if abs(x) <= pi/4
            s = x + hi
            lo += (x - (s - hi))
        else
            n_i, y = rem_pio2_kernel(x)
            n += n_i
            s = y.hi + hi
            lo += (y.hi - (s - hi)) + y.lo
        end
        hi = s
    end
    while hi > pi/4
        hi -= pi/2
        lo -= 6.123233995736766e-17
        n += 1
    end
    while hi < -pi/4
        hi += pi/2
        lo += 6.123233995736766e-17
        n -= 1
    end
    return n, DoubleFloat64(hi, lo)
end
julia> @benchmark rem_pio2_sum(x, y, z) setup=(x=rand(); y = rand(); z = rand())
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
 Range (min … max):  26.606 ns …  1.319 μs  ┊ GC (min … max): 0.00% … 95.97%
 Time  (median):     29.220 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.298 ns ± 47.517 ns  ┊ GC (mean ± σ):  6.02% ±  4.00%

   ▃▅████▄▃▅▅▃ ▁▁                     ▁                  ▁▁▁▁ ▂
  ███████████████▆▄▁▁▁▃▁▁▁▁▁▁▁▁▁▄▄▆▇▇█████▇▆▇▆▆▄▄▅▄▁▄▁▆▇▆████ █
  26.6 ns      Histogram: log(frequency) by time      54.7 ns <

 Memory estimate: 80 bytes, allocs estimate: 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants