Skip to content

Conversation

@SteveBronder
Copy link
Collaborator

Summary

This PR makes the following changes for the laplace approximation:

  1. Adds a wolfe line search to the Newton solver used in the laplace approximation to improve convergence.
  • The example code provided from Laplace Bug when passing Eigen::Map in tuple of functor arguments #3205 fails on develop. The issue arose that the initial value of 0 for theta started the model in the tail of the distribution. The quick line search we did which only tested half of a newton step was not robust enough for this model to reach convergance. This PR adds a full wolfe line search to the Newton solver used in the laplace approximation to improve convergence in such cases.
    The graphic below shows the difference in estimates of the log likelihood for laplace relative to integrate_1d on the roach test data plotted along the mu and sigma estimates. There is still a bias relative to integrate_1d as mu becomes negative and sigma becomes larger, but it is much nicer than before.
image
  • The main loop for laplace_marginal_density_est is expensive as it requires calculating either a diagonal hessian or block diagonal hessian with 2nd order autodiff. The wolfe line search only requires the gradients of the likelihood with respect to theta. So with that in mind the wolfe line search tries pretty aggressively get the best step size. If our initial step size is successful, we try to keep doubling until we hit a step size where the strong wolfe conditions fail and then return the information for the step right before that failure. If our initial step size does not satisfy strong wolfe then we do a bracketed zoom with cubic interpolation until till we find a step size that satisfies the strong wolfe conditions.
    Tests for the wolfe line search are added to test/unit/math/laplace/wolfe_line_search.hpp.
  1. Fixes bugs in the laplace approximation
  • Fix iteration mismatch between values when line search succeeds
    In the last iteration of the laplace approximation we were returning the negative block diagonal hessian and derived matrices from the previous search. This is fine if the line search in that last step failed. But if the line search succeeds then we need to go back and recalculate the negative block diagonal hessian and it's derived quantities.
  • Breakup diagonal and block hessian functions
    Previously we had one block_hessian function that calculated both the block hessian or the diagonal hessian at runtime. But this function is only used in places where we know at compile time whether we want a block or diagonal hessian. So I split out the two functions to avoid unnecessary runtime branching.
  • barzilai_borwein_step_size
    For an initial step size estimate before each line search we use the Barzilai-Borwein method to get an estimate.
  • Adjoints of ll args only calculated once
    Previously we calculated them eargerly in each laplace iteration. But they are not needed within the inner loop so we wait till we finish the inner search then calculate their adjoints once afterwards.
  • Calculate covariance once at the start and reuse throughout.
    We were calculating the covariance matrix from inside of laplace_density_est, but this required us to then return it from that function and imo looked weird. So I pulled it out and now laplace_marginal_density_est is passed the covariance matrix.
  1. Fixes numerical stability in laplace distributions
    There were a few places where we could use log_sum_exp etc. so I made those changes.
  2. Fixes "bug" in finite difference step size calculation
  • Changed from cube root of epsilon to epsilon^(1/7) for 6th order
    The finite difference method in Stan was previously using stepsize optimzied a 2nd order method. But the code is a 6th order method. I modified finite_diff_stepsize to use epsilon^(1/7) instead of cbrt(epsilon). With this change all of the laplace tests pass with a much higher tolerance for precision.

Tests

All the AD tests now have a tighter tolerance for the laplace approximation.
There are also tests for the wolfe line search in test/unit/math/laplace/wolfe_line_search.hpp.

./runTests.py test/unit/math/laplace

Release notes

Improve laplace approximation with wolfe line search and bug fixes.

Checklist

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@SteveBronder SteveBronder changed the title Fix/wolfe zoom1 Add Wolfe line search to Laplace approximation Oct 24, 2025
@charlesm93 charlesm93 self-assigned this Oct 28, 2025
@charlesm93
Copy link
Contributor

charlesm93 commented Oct 28, 2025

I'll have a stab at reviewing the code but from the description this looks like an amazing PR! A few initial questions:

  • The Barzilai Borwein method is used only if line_search = TRUE, right?
  • Before this PR, was the covariance matrix being computed over and over?
  • What's the finite difference step size calculation? Is this for unit tests?
  • Do we have some performance tests for other problems to check how runtimes are affected by these changes?

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Oct 28, 2025

The Barzilai Borwein method is used only if line_search = TRUE, right?

So currently line search is always on. If we have line search off should we always just be taking one full newton step each iteration? My thought process was we should always have it on since it only requires calculating the gradient with respect to theta. sometimes we can actually get away with even taking 2x a newton step. And for some functions, like Aki's example, we need to have a very small stepsize at first but then can just back to 1x steps after.

Before this PR, was the covariance matrix being computed over and over?

No before it was just being computed in laplace_marginal_density_est which is pretty low level. I think it looks nicer to compute it before we go into laplace_marginal_density_est.

What's the finite difference step size calculation? Is this for unit tests?

That was a change for the unit tests. Though they were breaking other tests so I'm going to revert those. I think that needs to be looked at inside of another PR.

Do we have some performance tests for other problems to check how runtimes are affected by these changes?

No we do not. If I have time I'd like to get add some logging and compare the previous and current implementation in terms of time spent. I have mostly just been looking at the unit tests and ballparking if they seem to go slower or faster when I run them. But yes actual performance tests would be nice.

…y values have small values that lose accuracy with finite diff
@SteveBronder
Copy link
Collaborator Author

@charlesm93 another Q. If we fail to converge after N iterations should we throw a hard error or return back what we have so far with a warning? I'm thinking about how for wolfe we just return with a warning instead of throwing away the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants