updated Cmdstan to 2.37 and updated package version in project toml#30
updated Cmdstan to 2.37 and updated package version in project toml#30bparbhu merged 3 commits intorelease/stanvar-v4_5_3from
Conversation
|
CI failed at the bernoulli run. I'll check to see what's going on there. |
|
The old expectation running in the tests are -8.2 in terms of what's expected we might need to shift to -8.5 or -8.6 |
|
I'm further confused about why these are the parameters we're checking. These correspond to checks of the 'log_p__' and 'log_g__', which I suspect mean much more to people who aren't me, and we aren't considering |
|
Hey all, So I'm not too sure about why the atol values are set to those particular values. Maybe @goedman could you help give us an understanding of the tests written here. I just padded them with .7 as the values we were getting were deviating from -8.2 so just checking to understand this a bit. Again there are no issues I think with cmdstan 2.37 but just trying to understand the reason for the values. I set them that way to just get them to pass the tests and also based on the new values for cmdstan 2.37. |
|
I do want to hold off a little until we know what the idea behind these tests are. Maybe we should take notes from other interfaces... |
|
Hey all, I've been sitting on this for a bit. I have not seen the criteria that we test for in CmdStanPy or CmdStanR. Also something to keep in mind is this interface was made before the other two. So what they decided to test for might be different. In CmdStanPy this is the variational test that is run https://github.com/stan-dev/cmdstanpy/blob/develop/test/test_variational.py . In CmdStanR this is the test for variational here https://github.com/stan-dev/cmdstanr/blob/master/tests/testthat/test-model-variational.R In the future we can adopt something simliar to what is done in those interfaces but for now I would like to push this along. the values that we had previously can be reverted if necessary but we can also build towards tests that align with the other interfaces as well. I looked through the tests for these values to see if there were assertions and there were not any around atol values. @test ms[1, 2, 1] ≈ -8.2 atol=0.7 Again. We can revert back to these values if need be |
|
I still just don't quite know that these are the right tests. The R tests make sure things run or fail as expected and the py tests include some approximate value checks for good step size parameters and fits, and I think that makes enough sense to imitate; it's not clear what this test is getting us or how we'd use it to diagnose anything. |
|
Understood, even if we want to redesign what the test were to be I still think we should push forward. The only thing we really did was accommodate the newer version of CmdStan we didn't do anything that would warrant anything serious to be honest. I do agree that we should understand why we are using tests and what they should represent but given the circumstances we could bring this up in the Stan discourse and maybe get all of us to centralize our tests and what we test for. It's something that other Stan maintainers have been striving for anyway in terms of consistency across the interfaces. In such, I think we should proceed but follow up with others in terms of consolidating the types of tests we're doing. |
|
Given that extra note on the to-do list I'm in favor. |
Awesome, thanks @JeffreyBroll, much appreciated! |
|
merging this in now. |
Updating CmdStan to 2.37 and updating the package version in Project toml to 4.5.3