This case study aims to provide a comprehensive example of implementing analytic derivatives using Stan's external C++ interface. By avoiding automatic differentiation, we can significantly accelerate computation. Moreover, by leveraging template functions, we achieve a level of abstraction that is not directly supported in Stan. The focus of this article is on demonstrating the process of implementing a custom probability distribution in Stan, covering both the mathematical and technical aspects. This article is generally intended for statisticians with some C++ experience.
Throughout this case study, we will also explain aspects of Stan's gradient interface. Stan's reverse-mode automatic differentiation completely eliminates the burden of manually implementing derivatives. However, as noted in Stan's developer wiki, there are still several probability functions in Stan that lack derivative implementations. These gaps can become performance bottlenecks for certain programs. With the involvement of individuals with strong mathematical backgrounds, the Stan Math Library has substantial potential for further improvement in this area.
Currently, using external C++ functions is the only method to implement a function with a known analytic gradient outside the Stan Math Library. Let's begin by introducing how to interact with external C++ code.
There are numerous examples available demonstrating how to integrate basic external C++ code in Stan. Some relevant official documentation includes the following:
These resources are spread across various documents. Here, we provide practical, working examples and highlight key considerations in practice.
Consider the following Stan model, based on the bernoulli example in the CmdStan documentation. Assume that there are the following Stan code and C++ header files in the same directory.
bernoulli_example.stan
functions {
real make_odds(data real theta);
}
data {
int<lower=0> N;
array[N] int<lower=0, upper=1> y;
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(1, 1); // uniform prior on interval 0, 1
y ~ bernoulli(theta);
}
generated quantities {
real odds;
odds = make_odds(theta);
}
external.hpp
#include <iostream>
namespace bernoulli_example_model_namespace {
double make_odds(const double& theta, std::ostream *pstream__) {
return theta / (1 - theta);
}
}
To use external.hpp
in Stan, basically we need to achieve the followings:
-
In the Stan model, expose function declarations in the
functions
block. -
When compile, add
--allow-undefined
toSTANCFLAGS
and specify where the header files are through theuser_header
option.
Below are the code to drive the above model from different interfaces.
from cmdstanpy import CmdStanModel
model = CmdStanModel(stan_file='bernoulli_example.stan', compile=False)
model.compile(user_header='external.hpp')
fit = model.sample(data={'N':10, 'y':[0,1,0,0,0,0,0,0,0,1]})
fit.stan_variable('odds')
The code is basically from this post. It's just in the lastest version of cmdstanpy
, we don't have to explicitly add the --allow-undefined
flag to the Stan transpiler, only specify the user_header
when compile and cmdstanpy
will automatically do this.
library(cmdstanr)
model <- cmdstan_model('bernoulli_example.stan',
include_paths=getwd(),
cpp_options=list(USER_HEADER='external.hpp'),
stanc_options = list("allow-undefined")
)
fit <- model$sample(data=list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)))
fit$draws('odds')
This feature is no longer supported after pystan
is upgraded to 3.0.
For rstan
, c++ functions do not need to be contained in any specific namespace.
external_rstan.hpp
#include <iostream>
double make_odds(const double& theta, std::ostream *pstream__) {
return theta / (1 - theta);
}
R
code
library(rstan)
model <- stan_model('bernoulli_example.stan',
allow_undefined = TRUE,
includes = paste0('\n#include "', file.path(getwd(), 'external_rstan.hpp'), '"\n'),
)
fit <- sampling(model, data = list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)))
extract(fit)$odds
If there are errors during compilation or running, try troubleshooting the following issues:
-
If the Stan source code file name is
bernoulli_example.stan
, then the namespace name in all the header files (if there is more than one function to include and they are in different header files) must bebernoulli_example_model_namespace
. This situation arises if and only if you are usingcmdstan
. -
Always add the additional argument
std::ostream *pstream__
to the function declaration. This is to allow the print statement of the function body to be output properly -
Check function signatures in Stan
functions
block to match the function declarations in corresponding header files.
In most projects, we often have multiple model files that rely on the same (or shared) C++ header files. To handle this, one typically needs to manually change the C++ code's namespace for each model during compilation, which can be both tedious and prone to errors. The following code addresses this issue by automating the process:
In Python
import re
def change_namespace(stan_name, user_header):
"""Change the namespace name in the user header file.
Args:
stan_name (str): desired Stan file name
user_header (str): path to user header file
"""
with open(user_header, 'r') as file:
content = file.read()
new_content = re.sub(r'^namespace \S+_model_namespace {',
f'namespace {stan_name}_model_namespace {{',
content, flags=re.MULTILINE)
with open(user_header, 'w') as file:
file.write(new_content)
In R
#' Change the namespace in the user header file
#'
#' @param stan_name The name of the stan file
#' @param user_header The path to the user header file
#' @return NULL
change_namespace <- function(stan_name, user_header) {
content <- readLines(user_header, warn = FALSE)
content <- paste(content, collapse="\n")
pattern <- "\nnamespace \\S+_model_namespace {"
replacement <- paste0("\nnamespace ", stan_name, "_model_namespace {")
new_content <- gsub(pattern, replacement, content, perl = TRUE)
writeLines(new_content, user_header)
}
Now that we understand how to call external C++ code from Stan, it's important to note that if we merely "translate" what is possible in Stan into C++, there will be minimal differences when the compiler processes it. The real benefit of using C++ lies in enabling functionalities not yet available in Stan, utilizing its extensive third-party libraries, and taking advantage of new language features. Below are some common feature requests:
-
C++ has a more mature ecosystem, providing better support for niche mathematical functions that are not yet fully integrated into Stan. Expecting these functions to be included in Stan's core would place an unreasonable burden on its developers.
-
C++ offers advanced abstractions, whereas Stan is more constrained linguistically. For instance, template functions allow the same algorithm to be applied to different data types, which can significantly reduce development time, especially in production environments.
-
Stan currently lacks gradient interfaces. If one has an analytical gradient and wants to avoid the overhead of automatic differentiation, using external C++ code is the only viable option.
To illustrate how these issues arise, are analyzed, and subsequently resolved, let's begin with a concrete example.
Reading the following case study together with the document Adding A New Distribution will provide a better experience.
The beta negative binomial distribution is a generalization of negative binomial distribution. It is a compound distribution of negative binomial distribution and beta distribution. Assume
where we treat the probability of failure
The main advantage of using external C++ files is the flexibility to do things that cannot be done directly in the Stan language. But writing a distribution is something Stan can do.
Suppose we have N
data points and scalar parameters r, a,
and b
.
data {
array[N] int<lower=0> y;
}
parameters {
real<lower=0> r;
real<lower=0> a;
real<lower=0> b;
}
This distribution can be coded directly in Stan
real beta_neg_binomial_lpmf(int y, real r, real a, real b) {
real lprobs = lgamma(y+1/b) + lbeta(y+r*b/a, 1/a+1/b+1)
- lgamma(y+1) - lgamma(1/b) - lbeta(r*b/a, 1/a+1);
return lprobs;
}
...
for (i in 1:N) {
target += beta_neg_binomial_lpmf(y[i], r, a, b);
}
Based on Stan user's guide, the following good practices can help save more time and space in automatic differentiation.
real beta_neg_binomial_lpmf(array[] int y, real r, real a, real b) {
int N = size(y);
vector[N] lprobs;
for (i in 1:N) {
lprobs[i] = lgamma(y[i]+1/b) + lbeta(y[i]+r*b/a, 1/a+1/b+1) - lgamma(y[i]+1) - lgamma(1/b) - lbeta(r*b/a, 1/a+1);
}
return sum(lprobs);
}
...
target += beta_neg_binomial_lpmf(y, r, a, b);
If any one of the parameters r
, a
, or b
can be a vector, we would need to repeatedly implement the function to accommodate the following signatures:
real beta_neg_binomial_lpmf(array[] int y, real r, real a, real b)
real beta_neg_binomial_lpmf(array[] int y, vector r, real a, real b)
real beta_neg_binomial_lpmf(array[] int y, real r, vector a, vector b)
real beta_neg_binomial_lpmf(array[] int y, vector r, vector a, real b)
real beta_neg_binomial_lpmf(array[] int y, vector r, real a, vector b)
real beta_neg_binomial_lpmf(array[] int y, real r, vector a, vector b)
real beta_neg_binomial_lpmf(array[] int y, vector r, vector a, vector b)
However, this approach does not cover all possible cases, and errors are likely to occur during the coding process. This highlights the second issue mentioned at the end of the previous section.
Using external C++ allows us to write the function once, but automatically adapt it to different data structures when dealing with new distributions. In the following, we will explore how this works.
To fully implement a distribution in Stan, it is often desirable to mathematically derive certain derivatives and include them as well. Generally speaking, suppose we have a target distribution
We aim to calculate the gradients of this log-likelihood function with respect to the distribution parameters
where
Next, we take BNB distribution as an example to show the calculation process. In which case
Firstly, we give the conclusions about the first derivatives of the gamma and beta functions. Let
The derivative of the logarithmic beta function is
Similarly,
The BNB logarithmic pmf can be expressed as combination of log gamma and log beta functions:
Use the previous result, the partial derivatives with respect to the three parameters
Then let's take a look at the ccdf. From Wolfrom, the ccdf for
where
It's too lengthy to explicitly express
Remember now our task is to calculate
Let's first take a close look at the first element in this vector. After simply taking the logarithm and taking the partial derivatives, we have
The only term in this formula that's hard to express exactly from now is the last term, the partial derivative of the
Although the notation may look complicated, all we need is the basic chain rule. We have
Hence
Now we are curious about
Note that
Based on the derivative rules for multivariate composite functions. Given a function
Therefore
where the superscript
Expression
Finally, the partial derivative of the
Similarly, the partial derivative of the
where
The cdf for
The partial derivative of the
This is to say, to know
It is difficult to implement these functions directly in Stan. First, the
We've worked out
For the implementation, our aim is to compelete four functions: beta_neg_binomial_lpmf
, beta_neg_binomial_lcdf
, beta_neg_binomial_lccdf
and beta_neg_binomial_rng
. The existing probability functions in the Stan Math Library provide very high-quality examples, which can be found here.
The skeleton of each function is basically as follows.
template <typename T_x, typename T_params>
stan::return_type_t<T_x, T_params> {distribution}_lpmf(T_x&& x, T_params&& params) {
// Type Aliases
using stan::partials_return_t;
using T_partials_return = partials_return_t<T_x, T_params>;
// Error Handling
// Check sizes of input parameters
check_consistent_sizes(x, params);
if (size_zero(x) || size_zero(params)) {
return 0.0;
}
// Check domain of input parameters
check_positive_finite(x);
check_positive_finite(params);
// Add other domain checks as needed
// Initialization
T_partials_return logp = 0.0; // Initialize log probability
operands_and_partials<T_x, T_params> ops_partials(x, params); // Initialize partial derivatives
// Convert inputs to vector views to handle both scalars and vectors
scalar_seq_view<T_x> x_vec(x);
scalar_seq_view<T_params> params_vec(params);
// Determine sizes of input data
size_t size_x = stan::math::size(x);
size_t size_params = stan::math::size(params);
// Implementation Details
for (size_t i = 0; i < max_size(x, params); ++i) {
// Core logic for calculating the log probability
// Example: logp += lgamma(arg1) - lgamma(arg2) + ...;
// Here arg1 and arg2 are placeholders for actual arguments to the lgamma function
// Gradient calculations for automatic differentiation
if (!is_constant_all<T_x, T_params>::value) {
// Compute partial derivatives
// Example:
// ops_partials.edge1_.partials_[i] += ...;
// ops_partials.edge2_.partials_[i] += ...;
}
}
// Collect results and return
return ops_partials.build(logp);
}
Let's start with beta_neg_binomial_lpmf
. We only need to write one function instead of writing it for each input type permutation.
Identity namespace and write code within it.
namespace <THE_NAME_OF_STAN_MODEL>_model_namespace {
......
}
Declaration of the function.
template <bool propto, typename T_n, typename T_r, typename T_size1,
typename T_size2,
stan::require_all_not_nonscalar_prim_or_rev_kernel_expression_t<
T_n, T_r, T_size1, T_size2>* = nullptr>
stan::return_type_t<T_r, T_size1, T_size2> beta_neg_binomial_lpmf(const T_n& n,
const T_r& r,
const T_size1& alpha,
const T_size2& beta,
std::ostream* pstream__) {
......
}
Specify aliases.
using stan::partials_return_t;
using stan::ref_type_t;
using stan::ref_type_if_t;
using stan::is_constant;
using stan::is_constant_all;
using stan::VectorBuilder;
using stan::scalar_seq_view;
using stan::math::lgamma;
using stan::math::size;
using stan::math::max_size;
using T_partials_return = partials_return_t<T_r, T_size1, T_size2>;
using T_r_ref = ref_type_t<T_r>;
using T_alpha_ref = ref_type_t<T_size1>;
using T_beta_ref = ref_type_t<T_size2>;
Check whether the shape of the incoming data conforms to the specification. It throws a std::invalid_argument
if the sizes of the input containers don't match.
static const char* function = "beta_neg_binomial_lpmf";
check_consistent_sizes(function, "Failures variable", n,
"Number of failure parameter", r,
"Prior success parameter", alpha,
"Prior failure parameter", beta);
if (size_zero(n, r, alpha, beta)) {
return 0.0;
}
Check whether the value of the incoming parameter vectors are within the parameter spaces. throws a std::domain_error
if any of the parameters are not positive and finite.
T_r_ref r_ref = r;
T_alpha_ref alpha_ref = alpha;
T_beta_ref beta_ref = beta;
check_positive_finite(function, "Number of failure parameter", r_ref);
check_positive_finite(function, "Prior success parameter", alpha_ref);
check_positive_finite(function, "Prior failure parameter", beta_ref);
If propto = TRUE
and all other parameters are not autodiff types, return zero.
if (!include_summand<propto, T_r, T_size1, T_size2>::value) {
return 0.0;
}
Initialization return value, as well as some quantities that will be reused in subsequent calculations.
T_partials_return logp(0.0);
operands_and_partials<T_r_ref, T_alpha_ref, T_beta_ref> ops_partials(r_ref, alpha_ref, beta_ref);
scalar_seq_view<T_n> n_vec(n);
scalar_seq_view<T_r_ref> r_vec(r_ref);
scalar_seq_view<T_alpha_ref> alpha_vec(alpha_ref);
scalar_seq_view<T_beta_ref> beta_vec(beta_ref);
size_t size_n = stan::math::size(n);
size_t size_r = stan::math::size(r);
size_t size_alpha = stan::math::size(alpha);
size_t size_beta = stan::math::size(beta);
size_t size_n_r = max_size(n, r);
size_t size_r_alpha = max_size(r, alpha);
size_t size_n_beta = max_size(n, beta);
size_t size_alpha_beta = max_size(alpha, beta);
size_t max_size_seq_view = max_size(n, r, alpha, beta);
Determines whether support for incoming observations is valid.
for (size_t i = 0; i < max_size_seq_view; i++) {
if (n_vec[i] < 0) {
return ops_partials.build(LOG_ZERO);
}
}
Compute the log pmf
// compute gamma(n+1)
VectorBuilder<include_summand<propto>::value, T_partials_return, T_n>
normalizing_constant(size_n);
for (size_t i = 0; i < size_n; i++)
if (include_summand<propto>::value)
normalizing_constant[i] = -lgamma(n_vec[i] + 1);
// compute lbeta denominator with size r and alpha
VectorBuilder<true, T_partials_return, T_r, T_size1> lbeta_denominator(size_r_alpha);
for (size_t i = 0; i < size_r_alpha; i++) {
lbeta_denominator[i] = lbeta(r_vec.val(i), alpha_vec.val(i));
}
// compute lgamma denominator with size beta
VectorBuilder<true, T_partials_return, T_size2> lgamma_denominator(size_beta);
for (size_t i = 0; i < size_beta; i++) {
lgamma_denominator[i] = lgamma(beta_vec.val(i));
}
// compute lgamma numerator with size n and beta
VectorBuilder<true, T_partials_return, T_n, T_size2> lgamma_numerator(size_n_beta);
for (size_t i = 0; i < size_n_beta; i++) {
lgamma_numerator[i] = lgamma(n_vec[i] + beta_vec.val(i));
}
// compute lbeta numerator with size n, r, alpha and beta
VectorBuilder<true, T_partials_return, T_n, T_r, T_size1, T_size2> lbeta_diff(max_size_seq_view);
for (size_t i = 0; i < max_size_seq_view; i++) {
lbeta_diff[i] = lbeta(n_vec[i] + r_vec.val(i),
alpha_vec.val(i) + beta_vec.val(i)) + lgamma_numerator[i]
- lbeta_denominator[i] - lgamma_denominator[i];
}
Compute derivative with respect to
// compute digamma(n+r+alpha+beta)
VectorBuilder<!is_constant_all<T_r, T_size1, T_size2>::value, T_partials_return,
T_n, T_r, T_size1, T_size2>
digamma_n_r_alpha_beta(max_size_seq_view);
if (!is_constant_all<T_r, T_size1, T_size2>::value) {
for (size_t i = 0; i < max_size_seq_view; i++) {
digamma_n_r_alpha_beta[i]
= digamma(n_vec[i] + r_vec.val(i) + alpha_vec.val(i) + beta_vec.val(i));
}
}
// compute digamma(alpha+beta)
VectorBuilder<!is_constant_all<T_size1, T_size2>::value, T_partials_return,
T_size1, T_size2>
digamma_alpha_beta(size_alpha_beta);
if (!is_constant_all<T_size1, T_size2>::value) {
for (size_t i = 0; i < size_alpha_beta; i++) {
digamma_alpha_beta[i] = digamma(alpha_vec.val(i) + beta_vec.val(i));
}
}
// compute digamma(n+r)
VectorBuilder<!is_constant_all<T_r>::value, T_partials_return, T_n, T_r>
digamma_n_r(size_n_r);
if (!is_constant_all<T_r>::value) {
for (size_t i = 0; i < size_n_r; i++) {
digamma_n_r[i] = digamma(n_vec[i] + r_vec.val(i));
}
}
// compute digamma(r+alpha)
VectorBuilder<!is_constant_all<T_r, T_size1>::value, T_partials_return, T_r, T_size1>
digamma_r_alpha(size_r_alpha);
if (!is_constant_all<T_r, T_size1>::value) {
for (size_t i = 0; i < size_r_alpha; i++) {
digamma_r_alpha[i] = digamma(r_vec.val(i) + alpha_vec.val(i));
}
}
// compute digamma(n+beta)
VectorBuilder<!is_constant_all<T_size2>::value, T_partials_return, T_n, T_size2>
digamma_n_beta(size_n_beta);
if (!is_constant_all<T_n, T_size2>::value) {
for (size_t i = 0; i < size_n_beta; i++) {
digamma_n_beta[i] = digamma(n_vec[i] + beta_vec.val(i));
}
}
// compute digamma(r)
VectorBuilder<!is_constant_all<T_r>::value, T_partials_return, T_r> digamma_r(size_r);
if (!is_constant_all<T_r>::value) {
for (size_t i = 0; i < size_r; i++) {
digamma_r[i] = digamma(r_vec.val(i));
}
}
// compute digamma(alpha)
VectorBuilder<!is_constant_all<T_size1>::value, T_partials_return, T_size1> digamma_alpha(size_alpha);
if (!is_constant_all<T_size1>::value) {
for (size_t i = 0; i < size_alpha; i++) {
digamma_alpha[i] = digamma(alpha_vec.val(i));
}
}
// compute digamma(beta)
VectorBuilder<!is_constant_all<T_size2>::value, T_partials_return, T_size2> digamma_beta(size_beta);
if (!is_constant_all<T_size2>::value) {
for (size_t i = 0; i < size_beta; i++) {
digamma_beta[i] = digamma(beta_vec.val(i));
}
}
Build the return value.
for (size_t i = 0; i < max_size_seq_view; i++) {
if (include_summand<propto>::value)
logp += normalizing_constant[i];
logp += lbeta_diff[i];
if (!is_constant_all<T_r>::value)
ops_partials.edge1_.partials_[i]
+= digamma_n_r[i] - digamma_n_r_alpha_beta[i] - (digamma_r[i] - digamma_r_alpha[i]);
if (!is_constant_all<T_size1>::value)
ops_partials.edge2_.partials_[i]
+= digamma_alpha_beta[i] - digamma_n_r_alpha_beta[i] - (digamma_alpha[i] - digamma_r_alpha[i]);
if (!is_constant_all<T_size2>::value)
ops_partials.edge3_.partials_[i]
+= digamma_alpha_beta[i] - digamma_n_r_alpha_beta[i] + digamma_n_beta[i] - digamma_beta[i];
}
return ops_partials.build(logp);
For pmf/pdf functions, we have to overload the template function defined above. This version of the function template does not include the propto
parameter (default to false). It provides a simpler interface, which is used for direct function calls.
template <typename T_n, typename T_r, typename T_size1, typename T_size2>
inline stan::return_type_t<T_r, T_size1, T_size2> beta_neg_binomial_lpmf(const T_n& n,
const T_r& r,
const T_size1& alpha,
const T_size2& beta,
std::ostream* pstream__) {
return beta_neg_binomial_lpmf<false>(n, r, alpha, beta);
}
Since beta_neg_binomial_lccdf
and beta_neg_binomial_lcdf
are close in terms of formulation:
Let's explain the structure of beta_neg_binomial_lccdf
in detail.
Looking further at the specific expressions, we found that there are two difficulties in the implementation of lccdf, namely hypergeometric_3F2
and grad_F32
in Stan Math Library. See docs in and .
Likewise, first declare the function
template <typename T_n, typename T_r, typename T_size1, typename T_size2>
stan::return_type_t<T_size1, T_size2> beta_neg_binomial_lccdf(const T_n& n,
const T_r& r,
const T_size1& alpha,
const T_size2& beta,
std::ostream* pstream__) {
......
}
Specify aliases
using stan::partials_return_t;
using stan::ref_type_t;
using stan::ref_type_if_t;
using stan::is_constant;
using stan::is_constant_all;
using stan::VectorBuilder;
using stan::scalar_seq_view;
using stan::math::lgamma;
using stan::math::size;
using stan::math::max_size;
using T_partials_return = partials_return_t<T_n, T_r, T_size1, T_size2>;
using std::exp;
using std::log;
using T_r_ref = ref_type_t<T_r>;
using T_alpha_ref = ref_type_t<T_size1>;
using T_beta_ref = ref_type_t<T_size2>;
Check inputs
static const char* function = "beta_neg_binomial_lccdf";
check_consistent_sizes(function, "Failure variable", n,
"Number of failure parameter", r,
"Prior success parameter", alpha,
"Prior failure parameter", beta);
if (size_zero(n, r, alpha, beta)) {
return 0;
}
T_r_ref r_ref = r;
T_alpha_ref alpha_ref = alpha;
T_beta_ref beta_ref = beta;
check_positive_finite(function, "Number of failure parameter", r_ref);
check_positive_finite(function, "Prior success parameter", alpha_ref);
check_positive_finite(function, "Prior failure parameter", beta_ref);
Initialization
T_partials_return P(0.0);
operands_and_partials<T_r_ref, T_alpha_ref, T_beta_ref> ops_partials(r_ref, alpha_ref, beta_ref);
scalar_seq_view<T_n> n_vec(n);
scalar_seq_view<T_r_ref> r_vec(r_ref);
scalar_seq_view<T_alpha_ref> alpha_vec(alpha_ref);
scalar_seq_view<T_beta_ref> beta_vec(beta_ref);
size_t max_size_seq_view = max_size(n, r, alpha, beta);
Having previously checked the range of the parameters, don't forget to check the range of the observations
for (size_t i = 0; i < stan::math::size(n); i++) {
if (n_vec.val(i) < 0) {
return ops_partials.build(LOG_ZERO);
}
}
Compute the log ccdf
for (size_t i = 0; i < max_size_seq_view; i++) {
const T_partials_return n_dbl = n_vec.val(i);
const T_partials_return r_dbl = r_vec.val(i);
const T_partials_return alpha_dbl = alpha_vec.val(i);
const T_partials_return beta_dbl = beta_vec.val(i);
const T_partials_return b_plus_n = beta_dbl + n_dbl;
const T_partials_return r_plus_n = r_dbl + n_dbl;
const T_partials_return a_plus_r = alpha_dbl + r_dbl;
const T_partials_return one = 1;
const T_partials_return F = hypergeometric_3F2({one, b_plus_n + 1, r_plus_n + 1},
{n_dbl + 2, a_plus_r + b_plus_n + 1}, one);
T_partials_return C = lgamma(r_plus_n + 1) + lbeta(a_plus_r, b_plus_n + 1)
- lgamma(r_dbl) - lbeta(alpha_dbl, beta_dbl) - lgamma(n_dbl + 2);
C = F * exp(C);
const T_partials_return P_i = C;
P += log(P_i);
And the derivatives
where
T_partials_return digamma_abrn
= is_constant_all<T_r, T_size1, T_size2>::value
? 0
: digamma(a_plus_r + b_plus_n + 1);
T_partials_return digamma_ab
= is_constant_all<T_size1, T_size2>::value
? 0
: digamma(alpha_dbl + beta_dbl);
T_partials_return dF[6];
if (!is_constant_all<T_r, T_size1, T_size2>::value) {
grad_F32(dF, one, b_plus_n + 1, r_plus_n + 1, n_dbl + 2, a_plus_r + b_plus_n + 1, one, 1e-3);
}
if (!is_constant_all<T_r>::value) {
ops_partials.edge1_.partials_[i]
+= digamma(r_plus_n + 1) + (digamma(a_plus_r) - digamma_abrn) + (dF[2] + dF[4]) / F - digamma(r_dbl);
}
if (!is_constant_all<T_size1>::value) {
ops_partials.edge1_.partials_[i]
+= digamma(a_plus_r) - digamma_abrn + dF[4] / F - (digamma(alpha_dbl) - digamma_ab);
}
if (!is_constant_all<T_size2>::value) {
ops_partials.edge2_.partials_[i]
+= digamma(b_plus_n + 1) - digamma_abrn + (dF[1] + dF[4]) / F - (digamma(beta_dbl) - digamma_ab);
}
}
Finally
return ops_partials.build(P);
In the following we describe the random number generator.
We first sample
In Stan, the negative binomial distribution is given by
In our case
Through observation, the meaning of
Therefore, provided
In the implementation, because this is a random number generating function, we don't need to consider Stan's unique automatic differentiation type of vector, only scalars and std::vector
. So the code is relatively simple.
Routinely perform shape and range checks.
template <typename T_r, typename T_shape1, typename T_shape2, class RNG>
inline typename stan::VectorBuilder<true, int, T_r, T_shape1, T_shape2>::type
beta_neg_binomial_rng(const T_r &r, const T_shape1 &alpha, const T_shape2 &beta,
RNG &rng, std::ostream* pstream__) {
using stan::ref_type_t;
using stan::VectorBuilder;
using namespace stan::math;
using T_r_ref = ref_type_t<T_r>;
using T_alpha_ref = ref_type_t<T_shape1>;
using T_beta_ref = ref_type_t<T_shape2>;
static const char *function = "beta_neg_binomial_rng";
check_consistent_sizes(function, "Number of failure parameter", r,
"Prior success parameter", alpha,
"Prior failure parameter", beta);
T_r_ref r_ref = r;
T_alpha_ref alpha_ref = alpha;
T_beta_ref beta_ref = beta;
check_positive_finite(function, "Number of failure parameter", r_ref);
check_positive_finite(function, "Prior success parameter", alpha_ref);
check_positive_finite(function, "Prior failure parameter", beta_ref);
Use beta_rng
to generate
using T_p = decltype(beta_rng(alpha_ref, beta_ref, rng));
T_p p = beta_rng(alpha_ref, beta_ref, rng);
T_p odds_ratio_p;
if (std::is_same<T_p, double>::value) {
// Scalar case
odds_ratio_p = p / (1 - p);
} else {
// Vector case
odds_ratio_p.resize(p.size());
for (size_t i = 0; i < p.size(); i++) {
odds_ratio_p[i] = p[i] / (1 - p[i]);
}
}
Use neg_binomial_rng
to sample
return neg_binomial_rng(r_ref, odds_ratio_p, rng);
Then we compare the performance of precomputing gradients and using automatic differentiation on simulated datasets. We sampled
We ran 1000 iterations on one chain, the table below shows the parameter estimation and performance comparison. The forward time represents the time spent on the forward pass of the model calculation (statements without automatic differentiation), including evaluating the logarithmic pdf/pmf, etc. The reverse time is the time it takes to compute the reverse pass of the gradient in the context of automatic differentiation. The total time is the sum of these two plus some overhead. The chain stacks represent the number of objects on the stack allocated for chained automatic differentiation. No chain stack represents storage consumed by computations that do not require differentiation. Autodiff calls (No autodiff calls) is the number of executions of the profile
command in the code block with (without) automatic differentiation1.
**Metric** **With C++** **Pure Stan**
------------------- ------------------------------- ----------------------------------
*Likelihood*
Total time (s) 62.6521 143.977
Forward time (s) 62.6488 85.9782
Reverse time (s) [0.0033]{style="color: blue"} [57.9992]{style="color: red"}
Chain stacks [35042]{style="color: blue"} [3080921832]{style="color: red"}
No chain stacks 0 32796
Autodiff calls 35042 32796
No autodiff calls 1 1
*Priors*
Total time (s) 0.0091 0.0144
Forward time (s) 0.0059 0.0101
Reverse time (s) 0.0031 0.0042
Chain stacks 105126 98388
No chain stacks 0 0
Autodiff calls 35042 32796
No autodiff calls 1 1
*Posterior mean*
$\hat r$ 4.136 4.070
$\hat \alpha$ 1.725 1.714
$\hat \beta$ 0.568 0.573
: Performance analysis comparison between C++ analytic derivative implementation and Stan's built-in automatic differentiation implementation.