better sin_sum #88

oscardssmith · 2023-04-07T19:52:59Z

This makes sin_sum work for summing an arbitrary number of elements and retain accuracy for large inputs (>1e15), and uses this improvement to remove the edge cases in bessely and besselj. Performance and accuracy information to come.

heltonmc · 2023-04-07T19:57:08Z

Probably also worth switching out

Bessels.jl/src/asymptotics.jl

Lines 40 to 52 in a94bb7f

    
           # we need to calculate sin(x - v*pi/2 - pi/4) and cos(x - v*pi/2 - pi/4) 
        
           # For improved accuracy this is expanded using the formula for sin(x+y+z) 
        
           s, c = sincos(PIO2(S) * v) 
        
           sα, cα = sincos(α) 
        
           CMS = c - s 
        
           CPS = c + s 
        
           s1 = CMS * cα 
        
           s2 = CPS * sα 
        
           s3 = CMS * sα 
        
           s4 = CPS * cα

I haven't tested thoroughly if that section is the cause or not but this is related to #37

codecov · 2023-04-07T19:58:30Z

Codecov Report

Patch coverage: 64.06% and project coverage change: -1.18 ⚠️

Comparison is base (3276cf0) 97.30% compared to head (daf86ce) 96.12%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #88      +/-   ##
==========================================
- Coverage   97.30%   96.12%   -1.18%     
==========================================
  Files          20       20              
  Lines        2297     2299       +2     
==========================================
- Hits         2235     2210      -25     
- Misses         62       89      +27

Impacted Files	Coverage Δ
src/misc.jl	`55.73% <57.40%> (-44.27%)`	⬇️
src/besselj.jl	`98.31% <100.00%> (-0.11%)`	⬇️
src/bessely.jl	`98.03% <100.00%> (-0.11%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

oscardssmith · 2023-04-07T20:07:36Z

To do that properly I would have to restructure this a little bit to make it possible to get the sin and cos sum simultaneously.

heltonmc · 2023-04-07T20:16:00Z

Ya I think the error is coming before that anyway but it's been a while since I've looked at that function. This is great on my tests the accuracy was similar and was faster.

I do think this function is generally useful enough to be in base because so many people naively will compute sin(x + y) which loses lots of precision.... but I'm obviously bias 😄

oscardssmith · 2023-04-07T20:17:31Z

I understand where you're coming from, but the problem of errors not being composable isn't really solvable by adding random functions to base.

heltonmc · 2023-04-08T13:42:01Z

Ps. Totally fine to merge whenever you are ready!

oscardssmith · 2023-04-08T17:36:42Z

I want to do some benchmarks on the 24 to 1e15 region first to make sure I haven't regressed anything.

oscardssmith · 2023-04-08T17:39:15Z

I also think I can get rid of the two separate inputs which would be a bit cleaner.

oscardssmith · 2023-04-09T21:36:15Z

I've done the benchmarking and cleanup that I wanted, so I think this is now good to go. The only thing I'm not 100% happy with in this version is that sin_sum of Float32s will now lose a little accuracy since it accumulates into a Float64 which can incur some inaccuracy since floatmax(Float32) is 3.4f38 (but doing the reduction this way is ~20x faster than the method this uses for Float64.

oscardssmith · 2023-04-10T02:09:13Z

OK, I'm finally relatively happy with the Float32 version of the reduction. It still can lose some precision, but it's a lot better than before while maintaining relatively decent speed when you don't have large arguments.

heltonmc · 2023-04-10T02:25:06Z

src/besselj.jl

I noticed that the Float32 implementations still use p = p * cos(xn + x). Should this be switched over to the better sum formula now?

possibly. it will be a bit slower, so it probably needs a benchmark and accuracy plot to determine.

I can check it out tomorrow ! This weekend was busy with family but have time to look tomorrow more carefully

Yeesh - the besselj0 implementation was some of my first julia code which has been improved but the tests aren't very robust.

julia> Bessels.besselj0(328049.34f0) -0.0013240778f0 julia> Bessels.besselj0(Float64(328049.34f0)) -0.0013258623831875669

I'm imagining the better sin sum here could help.

LGTM. I posted an issue at #90 to track that in a separate PR.

heltonmc · 2023-04-11T16:38:56Z

@oscardssmith I was fixing the Float32 case (which drastically improves accuracy) but was noticing that the garbage collector was running.
For Float64 case.

# master
julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
 Range (min … max):  28.635 ns …  1.378 μs  ┊ GC (min … max): 0.00% … 94.85%
 Time  (median):     65.963 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   60.570 ns ± 44.076 ns  ┊ GC (mean ± σ):  2.31% ±  3.27%

  ▇    ▄                          ▂▇█▅▂          ▁▁▁      ▁▁▁ ▂
  █▁▆▄▁█▁▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█████▇▆▃▃▁▁▄▁▁▇███▇▆▆▄▅████ █
  28.6 ns      Histogram: log(frequency) by time      93.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Then the Float32 case I was actually getting allocations.

# on my test branch
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
 Range (min … max):  60.338 ns …  1.372 μs  ┊ GC (min … max): 0.00% … 95.17%
 Time  (median):     63.087 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   66.210 ns ± 48.384 ns  ┊ GC (mean ± σ):  3.00% ±  3.91%

  ▁▄▅▇██▇▆▄▂▁                      ▁▁                   ▁▁▁   ▂
  ████████████▇▅▃▄▁▁▁▁▁▃▁▁▁▁▁▁▃▃▁▅█████▇▆▆▄▅▅▅▄▃▁▁▁▅▆▆██████▇ █
  60.3 ns      Histogram: log(frequency) by time      90.4 ns <

 Memory estimate: 80 bytes, allocs estimate: 4.

The benchmarks on tagged version

# Bessels v0.2.8
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*50 + 15.0))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  15.738 ns … 28.151 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.074 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.054 ns ±  0.319 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▂▄▂                ▁▆█▄               
  ▁▁▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▃▅███▅▃▁▁▁▁▁▁▂▂▂▃▄▅▆▇████▆▃▂▂▃▄▅▆▆▆▄▃▂ ▃
  15.7 ns         Histogram: frequency by time        16.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark besselj0(x) setup=(x=rand()*50 + 15.0)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  16.520 ns … 32.577 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.916 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   17.940 ns ±  1.976 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅ ▂    ▁                                    ▆               
  ██▂█▇▁▁▁█▆▂▁▂▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▃▂▁▃ ▃
  16.5 ns         Histogram: frequency by time        22.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Any idea what's going on?

oscardssmith · 2023-04-11T18:02:07Z

Presumably you introduced a type instability somewhere.

heltonmc · 2023-04-11T18:09:31Z

Ill look closer but I get the allocation pulling out the rem_pio2_sum

using Base.Math: sin_kernel, cos_kernel, sincos_kernel, rem_pio2_kernel, DoubleFloat64, DoubleFloat32
function rem_pio2_sum(xs::Vararg{Float64})
    n = 0
    hi, lo = 0.0, 0.0
    for x in xs
        if abs(x) <= pi/4
            s = x + hi
            lo += (x - (s - hi))
        else
            n_i, y = rem_pio2_kernel(x)
            n += n_i
            s = y.hi + hi
            lo += (y.hi - (s - hi)) + y.lo
        end
        hi = s
    end
    while hi > pi/4
        hi -= pi/2
        lo -= 6.123233995736766e-17
        n += 1
    end
    while hi < -pi/4
        hi += pi/2
        lo += 6.123233995736766e-17
        n -= 1
    end
    return n, DoubleFloat64(hi, lo)
end

julia> @benchmark rem_pio2_sum(x, y, z) setup=(x=rand(); y = rand(); z = rand())
BenchmarkTools.Trial: 10000 samples with 995 evaluations.
 Range (min … max):  26.606 ns …  1.319 μs  ┊ GC (min … max): 0.00% … 95.97%
 Time  (median):     29.220 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   32.298 ns ± 47.517 ns  ┊ GC (mean ± σ):  6.02% ±  4.00%

   ▃▅████▄▃▅▅▃ ▁▁                     ▁                  ▁▁▁▁ ▂
  ███████████████▆▄▁▁▁▃▁▁▁▁▁▁▁▁▁▄▄▆▇▇█████▇▆▇▆▆▄▄▅▄▁▄▁▆▇▆████ █
  26.6 ns      Histogram: log(frequency) by time      54.7 ns <

 Memory estimate: 80 bytes, allocs estimate: 4.

better sin_sum

df002cc

heltonmc approved these changes Apr 7, 2023

View reviewed changes

sinsum improvements

5f1c77f

improve Float32 rem_pio2_sum

daf86ce

heltonmc reviewed Apr 10, 2023

View reviewed changes

heltonmc mentioned this pull request Apr 11, 2023

Large argument inaccuracy for Float32 in besselj0 #90

Open

heltonmc merged commit 4afbc6c into master Apr 11, 2023

This was referenced Apr 11, 2023

Improve performance for large arguments in besselj0 #76

Closed

Provide relative tolerances for besselj1(x) and besselj0(x) near zeros #22

Closed

oscardssmith deleted the better-sin_sum branch April 11, 2023 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better sin_sum #88

better sin_sum #88

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 7, 2023

codecov bot commented Apr 7, 2023 •

edited

Loading

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 7, 2023

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 8, 2023

oscardssmith commented Apr 8, 2023

oscardssmith commented Apr 8, 2023

oscardssmith commented Apr 9, 2023

oscardssmith commented Apr 10, 2023

heltonmc Apr 10, 2023

oscardssmith Apr 10, 2023

heltonmc Apr 10, 2023

heltonmc Apr 10, 2023

heltonmc Apr 11, 2023

heltonmc commented Apr 11, 2023 •

edited

Loading

oscardssmith commented Apr 11, 2023

heltonmc commented Apr 11, 2023

better sin_sum #88

better sin_sum #88

Conversation

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 7, 2023

codecov bot commented Apr 7, 2023 • edited Loading

Codecov Report

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 7, 2023

oscardssmith commented Apr 7, 2023

heltonmc commented Apr 8, 2023

oscardssmith commented Apr 8, 2023

oscardssmith commented Apr 8, 2023

oscardssmith commented Apr 9, 2023

oscardssmith commented Apr 10, 2023

heltonmc Apr 10, 2023

Choose a reason for hiding this comment

oscardssmith Apr 10, 2023

Choose a reason for hiding this comment

heltonmc Apr 10, 2023

Choose a reason for hiding this comment

heltonmc Apr 10, 2023

Choose a reason for hiding this comment

heltonmc Apr 11, 2023

Choose a reason for hiding this comment

heltonmc commented Apr 11, 2023 • edited Loading

oscardssmith commented Apr 11, 2023

heltonmc commented Apr 11, 2023

codecov bot commented Apr 7, 2023 •

edited

Loading

heltonmc commented Apr 11, 2023 •

edited

Loading