binomial_coefficient_log produces wrong results when N >> n #1592

martinmodrak · 2020-01-07T13:00:59Z

Description

For some values, the results from binomial_coefficient_log don't match those computed by R or Mathematica.

In #239, it is noted that we don't know, where the approximation we use for the case when N >> n comes from. I also don't understand it. R uses a quite a different approximation - their implementation is

double attribute_hidden lfastchoose(double n, double k)
{
    return -log(n + 1.) - lbeta(n - k + 1., k + 1.);
}

Stirling approximation produce good results for some of the cases, but also terrible results for others... The R's implementation seems to work neatly, but depends on quite a bunch of other code to compute the lbeta.

Example

In Stan:

binomial_coefficient_log(1e20, 100) = 4141.4308104325282
binomial_coefficient_log(845000, 8.45e-10) =  1.1925051146203358e-008

Expected Output

Compare to Mathematica, Rs lchoose and Stirling approximation implemented in R
via

lchoose_stirling <- function(n, k) {
  -(0.5) * (log(2) + log(pi) + log(k)) + k * (log(n) + 1 - log(k))
}

In both cases, there is a mismatch in second digit.
https://www.wolframalpha.com/input/?i=log%2810%5E20+choose+100%29

Stan:  4141.4308104325282
Math:  4241.430810432527877842402915999072777157961556194658677330
R:     4241.4308104325273
Stir:  4241.4316437630832

https://www.wolframalpha.com/input/?i=log%28845000+choose+845*10%5E-12%29

Stan:  1.1925051146203358e-008
Math:  1.2019540397111151911538109607610600259536643522634943 × 10^-8
R:     0 #Not great
R-cor: 1.2019542694474694e-08
Stir:  9.5269037411112816 #Even worse

Here R-cor is computed using the lfastchoose formula directly, e.g.:

lchoose_lbeta <- function(n, k) {
  - log1p(n) -lbeta(n - k + 1, k + 1) 
}

for some reason, lchoose calls a different computation method for those parameters

Current Version:

v3.0.0

The text was updated successfully, but these errors were encountered:

martinmodrak · 2020-01-07T16:09:27Z

This also suggests that binomial and beta densities could be subject to imprecisions :-(

bob-carpenter · 2020-01-07T22:40:03Z

Thanks for catching and debugging this. We should definitely fix. We can use any approximation that works.

…

On Jan 7, 2020, at 8:01 AM, Martin Modrák ***@***.***> wrote: Description For some values, the results from binomial_coefficient_log don't match those computed by R or Mathematica. In #239, it is noted that we don't know, where the approximation we use for the case when N >> n comes from. I also don't understand it. R uses a quite a different approximation - their implementation is double attribute_hidden lfastchoose(double n, double k) { return -log(n + 1.) - lbeta(n - k + 1., k + 1.); } Stirling approximation also produces the correct results in the cases I've found. Example In Stan: binomial_coefficient_log(1e20, 100) = 4141.4308104325282 binomial_coefficient_log(845000, 8.45e-10) = 1.1925051146203358e-008 Expected Output Compare to Mathematica, Rs lchoose https://www.wolframalpha.com/input/?i=log%2810%5E20+choose+100%29 Stan: 4141.4308104325282 Math: 4241.430810432527877842402915999072777157961556194658677330 R: 4241.4308104325273 https://www.wolframalpha.com/input/?i=log%28845000+choose+845*10%5E-12%29 Stan: 1.1925051146203358e-008 Math: 1.2019540397111151911538109607610600259536643522634943 × 10^-8 R: 0 #Not great Current Version: v3.0.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

martinmodrak · 2020-01-08T08:33:34Z

So I made a mistake in the initial description - Stirling approximation works reasonably only for some of the problematic cases (description now updated). Searching online, the links I found either say "use lgamma" (which we know is not enough) or "steal code from R", which we can't do as it is GPL. And the idea in R's code is (for me) non-trivial to adapt. The main workhorse in R is a function called lgamma_cor which computes the difference between Stirling's approximation to lgamma and actual lgamma value. They then proceed by computing the Stirling's approximation for lbeta and adding the corrections separately. Unfortunately, the code for lgamma_cor is opaque to me, so I can't reimplement it to avoid licensing issues. It uses something called chebyshev_eval which is probably related to Chebyshev polynomials, which have their boost implementation but I am totally out of my depth here and don't understand the surrounding math at all :-(

martinmodrak · 2020-01-08T08:45:30Z

Also, if I read the code correctly, Stan now relies on autodiff for derivatives of binomial_coefficient_log (I believe Stan's lchoose delegates to binomial_coefficient_log in math and there is no rev version of the function. But autodiffing through a complex approximation (as the one in R) might be undesirable, so analytic derivatives might need to be developed and tested as well. We do have analytic derivatives for lbeta implementation we have, but I would suspect they would fail where the current implementation fails.

martinmodrak mentioned this issue Jan 7, 2020

Fixing negative binomial phi cutoff #1497

Merged

5 tasks

martinmodrak added a commit to martinmodrak/math that referenced this issue Jan 13, 2020

Fixing lbeta and binomial_coefficient_log (stan-dev#1611, stan-dev#1592)

9e708af

martinmodrak mentioned this issue Jan 13, 2020

lbeta is unstable when the arguments differ a lot #1611

Closed

martinmodrak mentioned this issue Jan 14, 2020

Improved numerical stability of binomial_coefficient_log #1614

Merged

5 tasks

bbbales2 closed this as completed in 7991f40 Mar 13, 2020

mcol added this to the 3.1.0++ milestone Mar 13, 2020

mcol linked a pull request Mar 13, 2020 that will close this issue

Improved numerical stability of binomial_coefficient_log #1614

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binomial_coefficient_log produces wrong results when N >> n #1592

binomial_coefficient_log produces wrong results when N >> n #1592

martinmodrak commented Jan 7, 2020 •

edited

Loading

martinmodrak commented Jan 7, 2020

bob-carpenter commented Jan 7, 2020 via email

martinmodrak commented Jan 8, 2020

martinmodrak commented Jan 8, 2020

binomial_coefficient_log produces wrong results when N >> n #1592

binomial_coefficient_log produces wrong results when N >> n #1592

Comments

martinmodrak commented Jan 7, 2020 • edited Loading

Description

Example

Expected Output

Current Version:

martinmodrak commented Jan 7, 2020

bob-carpenter commented Jan 7, 2020 via email

martinmodrak commented Jan 8, 2020

martinmodrak commented Jan 8, 2020

martinmodrak commented Jan 7, 2020 •

edited

Loading