From 4c78c9244b3c5fd7ac856802c4e4b80364e08552 Mon Sep 17 00:00:00 2001 From: teddygroves Date: Mon, 25 Mar 2024 18:00:56 +0100 Subject: [PATCH] Update week 5 materials --- docs/metropolis-hastings.html | 44 +++---- docs/search.json | 4 +- docs/week3.html | 116 ++++++++--------- docs/week5.html | 230 ++++++++++++++++++---------------- materials/week5.qmd | 6 + 5 files changed, 209 insertions(+), 191 deletions(-) diff --git a/docs/metropolis-hastings.html b/docs/metropolis-hastings.html index 609708e..347ae5e 100644 --- a/docs/metropolis-hastings.html +++ b/docs/metropolis-hastings.html @@ -5012,7 +5012,7 @@

Markov Chains

For example:

The transitions can be measured as discrete time steps with the following matrix representation

-
+
import numpy as np
 T = np.matrix([[0.1, 0.1, 0.8], [0.5, 0.1, 0.4], [0.5, 0.2, 0.3]])
 print(T)
@@ -5023,7 +5023,7 @@

Markov Chains

Given an initial starting position

-
+
v0 = np.matrix([0.1, 0.4, 0.5])
 print(v0)
@@ -5031,7 +5031,7 @@

Markov Chains

We can simulate the probabilities of the next step given the transition matrix.

-
+
v1 = v0*T
 print(v1)
@@ -5039,7 +5039,7 @@

Markov Chains

Following this again we can simulate the states after two steps

-
+
v2 = v1*T
 print(v2)
@@ -5047,21 +5047,21 @@

Markov Chains

There’s a convenient way to calculate the next step given the starting condition.

-
+
print(v0*T**2)
[[0.316 0.139 0.545]]

What happens if we continue doing this for a long time?

-
+
print(v0*T**100)
[[0.35714286 0.14935065 0.49350649]]

And how does this change when taking the next step?

-
+
print(v0*T**101)
[[0.35714286 0.14935065 0.49350649]]
@@ -5126,7 +5126,7 @@

Define probability density function

-
+
import numpy as np
 from scipy.stats import norm
 
@@ -5136,21 +5136,21 @@ 

Define

Define proposal distribution

-
+
def proposal(x):
   return norm.rvs(x, 1, 1)[0]

Initialise sampler

-
+
current = 0.0
 samples = [current]

Sample from distribution

-
+
for i in range(10000):
     prop = proposal(current)
     accept_reject = prob(prop)/prob(current)
@@ -5166,21 +5166,21 @@ 

Sample from distr

Plot distribution

-
+
import matplotlib.pyplot as plt
 plt.hist(samples)
-
(array([   2.,   19.,  208.,  724., 1196., 1335., 1005.,  376.,   73.,
-          10.]),
- array([0.        , 0.38251406, 0.76502812, 1.14754218, 1.53005625,
-        1.91257031, 2.29508437, 2.67759843, 3.06011249, 3.44262655,
-        3.82514062]),
+
(array([   5.,   52.,  291.,  876., 1399., 1355.,  730.,  215.,   31.,
+           6.]),
+ array([0.        , 0.40351717, 0.80703434, 1.21055151, 1.61406869,
+        2.01758586, 2.42110303, 2.8246202 , 3.22813737, 3.63165454,
+        4.03517172]),
  <BarContainer object of 10 artists>)
-

+

@@ -5188,13 +5188,13 @@

Plot distribution

Trace plot

-
+
draw = [draw for draw, _ in enumerate(samples)]
 plt.plot(draw, samples)
-

+

@@ -5204,12 +5204,12 @@

Trace plot

Part 2: determining mean and standard deviation from data

I suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.

-
+
def eval_prob(data, mu, sigma):
     return np.log(norm.pdf(data,mu,sigma)).sum()

Here’s also a multivariate random number generator to generate proposals.

-
+
def proposal_multi(mu, sigma):
     mean = [mu, sigma]
     cov = [[0.2, 0], [0, 0.2]]  # diagonal covariance
diff --git a/docs/search.json b/docs/search.json
index 9b11f16..7d5bc10 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -15,7 +15,7 @@
     "href": "week5.html#example",
     "title": "Regression models for describing measurements",
     "section": "Example",
-    "text": "Example\nTeddy has never been in the lab and is bad at pipetting. Unfortunately, some label needed to be put in some Eppendorf tubes, and noone else was available to do it themselves or even supervise.\nEach tube had a required volume of label which Teddy tried to hit, but probably there was some inaccuracy due to bad eyesight, poor hand control or something.\nIn addition, there was also probably some consistent bias, as the pipette was consistently adding too much or too little liquid. It seemed pretty old.\nLuckily, someone was able to measure how much label ended up in 8 out of the 24 tubes with pretty good accuracy. Now we want to estimate how much label there is in the other 16 tubes, taking into account these measurements as well as what we know about the likely biased pipette and inconsistent pipetter.\nTo describe this situation we’ll first think of a regression model that describes the measurement setup, then use Python to simulate data from the model given some plausible parameter values. Next we’ll implement the model in Stan, then fit the simulated data using MCMC and then analyse the results.\n\nRegression model\nTo model the noise that Teddy introduced by being bad at pipetting and the bias introduced by the bad pipette, we need some parameters that connect the known target volumes with the unknown true volumes. Let’s call them \\(noise\\) and \\(bias\\). Since the volumes are constrained positive, a distribution that automatically excludes negative numbers is probably a good idea: the log-normal distribution is usually a good starting point. This equation describes a plausible relationship:\n\\[\nvolume \\sim LN(\\ln{(target\\cdot bias)}, noise)\n\\]\nTo model the helpful measurements, we use another log-normal distribution and assume the measuring device is unbiased and has known log-scale standard error \\(cal\\ error\\):1\n1 NB the scale parameter of a lognormal distribution represents multiplicative error\\[\nmeasurements \\sim LN(\\ln{volume}, cal\\ error)\n\\]\nTo round off the model we can think about the likely values of the unknown parameters \\(bias\\) and \\(noise\\). \\(bias\\) is likely not to be too far away from 1, otherwise someone would have probably thrown the pipette away already. A prior distribution that puts most of its mass between 0.75 and 1.25 therefore seems reasonable. Similarly, a prior for \\(noise\\) should probably not imply that Teddy’s pipetting errors regularly exceeded 30%. This consideration motivates a prior for \\(noise\\) that puts most of its mass below 0.15.\n\n\nSimulating fake data\nFirst some imports: as usual for this course we’ll be using arviz, matplotlib, cmdstanpy, pandas and numpy. stanio is a handy utility library for Stan: it is a dependency of cmdstanpy so you shouldn’t need to install it explicitly.\n\nimport arviz as az\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport stanio\n\nfrom cmdstanpy import CmdStanModel\n\nNow some hardcoded numbers, including true values for the parameters here: \\(bias\\) is 0.88 and \\(noise\\) is 0.1. Note that \\(cal\\ error\\) is much smaller than \\(bias\\).\n\nN = 24\nN_CAL = 8\nTARGET_VOLUMES = np.array(\n    [\n      *([200] * 8),\n      *([400] * 8),\n      *([800] * 8),\n    ]\n)\nMEASUREMENT_IX = np.array([1, 4, 9, 11, 15, 19, 21, 22])\nCAL_ERROR = 0.02\nBIAS_FACTOR = 0.88\nNOISE = 0.1\nRNG_SEED = 12345\n\nSimulate the true volumes\n\nrng = np.random.default_rng(seed=RNG_SEED)\nln_mean = [\n  np.log(target * BIAS_FACTOR)\n  for target in TARGET_VOLUMES\n]\nvolumes = rng.lognormal(mean=ln_mean, sigma=NOISE)\nvolumes\n\narray([152.64294349, 199.70810807, 161.32449302, 171.49715397,\n       174.67894069, 163.43175945, 153.50063823, 187.79919405,\n       364.94147086, 289.55488638, 445.13256694, 387.79655766,\n       326.2592979 , 385.23402356, 335.94110379, 349.87019831,\n       761.78380169, 620.86368119, 745.73037525, 809.71007835,\n       803.52489045, 683.21425317, 770.52360505, 598.61585552])\n\n\nPlot the volumes and the targets.\n\nf, ax = plt.subplots()\nbins = np.logspace(np.log10(100), np.log10(1000), 30)\nax.hist(volumes, bins=bins)\nfor t in (200, 400, 800):\n    ax.axvline(t, color=\"red\")\nax.semilogx()\nax.set_xticks([200, 400, 800], [200, 400, 800]);\nax.set(\n    xlabel=\"volume ($\\\\mu$l)\",\n    ylabel=\"Frequency\",\n    title=\"How much label ended up in the tubes\"\n);\n\n\n\n\n\n\n\n\nSimulate measurements for tubes in the MEASUREMENT_IX.\n\nmeasurements = [\n  rng.lognormal(np.log(vol), CAL_ERROR)\n  for vol in volumes[MEASUREMENT_IX]\n]\npd.DataFrame({\n  \"target volume\": np.array(TARGET_VOLUMES)[MEASUREMENT_IX], \n  \"actual volume\": volumes[MEASUREMENT_IX],\n  \"measured volume\": measurements\n})\n\n\n\n\n\n\n\n\n\ntarget volume\nactual volume\nmeasured volume\n\n\n\n\n0\n200\n199.708108\n199.077273\n\n\n1\n200\n174.678941\n176.256328\n\n\n2\n400\n289.554886\n281.877576\n\n\n3\n400\n387.796558\n387.163512\n\n\n4\n400\n349.870198\n362.149468\n\n\n5\n800\n809.710078\n853.238785\n\n\n6\n800\n683.214253\n693.919342\n\n\n7\n800\n770.523605\n783.399634\n\n\n\n\n\n\n\n\nI wrote up the implied statistical model in a Stan file at src/stan/ pipette.stan. This code loads this Stan file as a CmdStanModel object, checks its formatting and prints it out.\nNote that the model internally puts the data on log scale and then standardises it: this is a bit annoying but makes it way easier to set priors and can ultimately save you a lot of trouble.\n\nmodel = CmdStanModel(stan_file=\"../src/stan/pipette.stan\")\nmodel.format(overwrite_file=True, canonicalize=True)\nprint(model.code())\n\n17:52:36 - cmdstanpy - INFO - compiling stan file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette.stan to exe file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette\n17:52:41 - cmdstanpy - INFO - compiled model executable: /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette\n\n\nfunctions {\n  vector standardise(vector v, real m, real s) {\n    return (v - m) / s;\n  }\n  real standardise(real v, real m, real s) {\n    return (v - m) / s;\n  }\n  vector unstandardise(vector u, real m, real s) {\n    return m + u * s;\n  }\n  real unstandardise(real u, real m, real s) {\n    return m + u * s;\n  }\n}\ndata {\n  int<lower=1> N;\n  int<lower=0> N_cal;\n  vector<lower=0>[N] target_volume;\n  vector<lower=0>[N_cal] y;\n  array[N_cal] int<lower=1, upper=N> measurement_ix;\n  real<lower=0> cal_error;\n  int<lower=0, upper=1> likelihood;\n}\ntransformed data {\n  vector[N_cal] y_ls = standardise(log(y), mean(log(y)), sd(log(y)));\n  vector[N] target_volume_ls = standardise(log(target_volume), mean(log(y)),\n                                           sd(log(y)));\n  real cal_error_s = cal_error / sd(log(y));\n}\nparameters {\n  real<lower=0> volume_noise_s;\n  real bias_factor_l;\n  vector[N] volume_ls;\n}\nmodel {\n  volume_noise_s ~ lognormal(log(0.1), 0.5);\n  bias_factor_l ~ normal(0, 0.15);\n  volume_ls ~ normal(target_volume_ls + bias_factor_l, volume_noise_s);\n  if (likelihood) {\n    for (i in 1 : N_cal) {\n      y_ls[i] ~ normal(volume_ls[measurement_ix[i]], cal_error_s);\n    }\n  }\n}\ngenerated quantities {\n  real bias_factor = exp(bias_factor_l);\n  real volume_noise = volume_noise_s * sd(log(y));\n  vector[N] volume = exp(unstandardise(volume_ls, mean(log(y)), sd(log(y))));\n  vector[N_cal] y_rep;\n  vector[N_cal] llik;\n  for (i in 1 : N_cal) {\n    int ms_ix = measurement_ix[i];\n    y_rep[i] = lognormal_rng(log(volume[ms_ix]), cal_error);\n    llik[i] = lognormal_lpdf(y[i] | log(volume[ms_ix]), cal_error);\n  }\n}\n\n\n\n\nThis code loads some data into a dictionary that is compatible with Stan and carries out two MCMC runs, one in prior mode and one in posterior mode.\n\nstan_input_posterior = stanio.json.process_dictionary(\n    {\n      \"N\": N,\n      \"N_cal\": N_CAL,\n      \"target_volume\": TARGET_VOLUMES,\n      \"y\": measurements,\n      \"measurement_ix\": MEASUREMENT_IX + 1,\n      \"cal_error\": CAL_ERROR,\n      \"likelihood\": 1,\n  }\n)\nstan_input_prior = stan_input_posterior | {\"likelihood\": 0}\nmcmc_prior = model.sample(\n    data=stan_input_prior,\n    adapt_delta=0.999,\n    max_treedepth=12,\n    seed=RNG_SEED,\n)\nmcmc_posterior = model.sample(data=stan_input_posterior, seed=RNG_SEED)\nmcmc_prior.diagnose()\nmcmc_posterior.diagnose()\n\n17:52:41 - cmdstanpy - INFO - CmdStan start processing\n17:52:42 - cmdstanpy - INFO - CmdStan done processing.\n17:52:42 - cmdstanpy - INFO - CmdStan start processing\n17:52:42 - cmdstanpy - INFO - CmdStan done processing.\n17:52:42 - cmdstanpy - WARNING - Non-fatal error during sampling:\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nConsider re-running with show_console=True if the above output is unclear!\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                                                                                                                                                                                                                                                                                                                                \n                                                                                                                                                                                                                                                                                                                                \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'Processing csv files: /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_1.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_2.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_3.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_4.csv\\n\\nChecking sampler transitions treedepth.\\nTreedepth satisfactory for all transitions.\\n\\nChecking sampler transitions for divergences.\\nNo divergent transitions found.\\n\\nChecking E-BFMI - sampler transitions HMC potential energy.\\nE-BFMI satisfactory.\\n\\nEffective sample size satisfactory.\\n\\nSplit R-hat values satisfactory all parameters.\\n\\nProcessing complete, no problems detected.\\n'\n\n\nThe diagnostics seem ok, though interestingly the prior was pretty tricky to sample accurately.\nThis code loads both MCMC runs into an arviz InferenceData object.\n\ncoords = {\"obs\": MEASUREMENT_IX, \"tube\": range(N)}\ndims={\n    \"y\": [\"obs\"],\n    \"y_rep\": [\"obs\"],\n    \"target_volume\": [\"tube\"],\n    \"true_volume\": [\"tube\"],\n    \"volume\": [\"tube\"],\n    \"tube\": [\"tube\"]\n}\nidata = az.from_cmdstanpy(\n    posterior=mcmc_posterior,\n    prior=mcmc_prior,\n    log_likelihood=\"llik\",\n    observed_data=stan_input_posterior | {\n        \"true_volume\": volumes, \"tube\": range(N)\n    },\n    posterior_predictive={\"y\": \"y_rep\"},\n    coords=coords,\n    dims=dims\n)\nidata\n\n\n            \n              \n                arviz.InferenceData\n              \n              \n              \n            \n                  \n                  posterior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 2MB\nDimensions:          (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\n  * volume_ls_dim_0  (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23\n  * tube             (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23\nData variables:\n    volume_noise_s   (chain, draw) float64 32kB 0.149 0.2378 ... 0.1396 0.1959\n    bias_factor_l    (chain, draw) float64 32kB -0.2014 -0.07906 ... -0.07839\n    volume_ls        (chain, draw, volume_ls_dim_0) float64 768kB -1.275 ... ...\n    bias_factor      (chain, draw) float64 32kB 0.8176 0.924 ... 0.9842 0.9246\n    volume_noise     (chain, draw) float64 32kB 0.09116 0.1455 ... 0.1199\n    volume           (chain, draw, tube) float64 768kB 182.9 192.1 ... 855.2\nAttributes:\n    created_at:                 2024-03-25T16:52:42.632627\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000volume_ls_dim_0: 24tube: 24Coordinates: (4)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])volume_ls_dim_0(volume_ls_dim_0)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])Data variables: (6)volume_noise_s(chain, draw)float640.149 0.2378 ... 0.1396 0.1959array([[0.148979, 0.237794, 0.122436, ..., 0.140801, 0.178431, 0.198993],\n       [0.158124, 0.141742, 0.157009, ..., 0.167521, 0.137968, 0.115493],\n       [0.168817, 0.188685, 0.138985, ..., 0.138549, 0.116906, 0.133163],\n       [0.151583, 0.148293, 0.18479 , ..., 0.158791, 0.139599, 0.195866]])bias_factor_l(chain, draw)float64-0.2014 -0.07906 ... -0.07839array([[-0.201351 , -0.0790569, -0.105641 , ..., -0.123153 , -0.0710773,\n        -0.0700885],\n       [-0.0982885, -0.0746649, -0.0754556, ..., -0.128265 , -0.163382 ,\n        -0.123952 ],\n       [ 0.0213591, -0.182919 , -0.17903  , ..., -0.134593 , -0.132122 ,\n        -0.131316 ],\n       [-0.0805919, -0.0926586, -0.117743 , ..., -0.190522 , -0.0158839,\n        -0.0783902]])volume_ls(chain, draw, volume_ls_dim_0)float64-1.275 -1.194 ... 1.099 1.246array([[[-1.2747  , -1.19448 , -1.22282 , ...,  0.879734,  1.03417 ,\n          0.762953],\n        [-1.40687 , -1.09754 , -1.14365 , ...,  0.936603,  1.15697 ,\n          0.907036],\n        [-0.935161, -1.16238 , -1.13239 , ...,  0.901988,  1.09336 ,\n          1.1396  ],\n        ...,\n        [-1.38442 , -1.08361 , -1.21375 , ...,  0.945476,  1.12959 ,\n          1.01907 ],\n        [-1.03524 , -1.11487 , -0.966034, ...,  0.964533,  1.11521 ,\n          1.0335  ],\n        [-1.29565 , -1.14228 , -1.38558 , ...,  0.843054,  1.08359 ,\n          1.15616 ]],\n\n       [[-1.28813 , -1.15007 , -1.30488 , ...,  0.863103,  1.06637 ,\n          1.34264 ],\n        [-1.17917 , -1.1655  , -1.16521 , ...,  0.874006,  1.07137 ,\n          1.25143 ],\n        [-1.20124 , -1.14406 , -1.0035  , ...,  0.908254,  1.06948 ,\n          1.04256 ],\n...\n        [-1.28454 , -1.16143 , -1.07814 , ...,  0.871924,  1.07582 ,\n          1.04087 ],\n        [-1.35706 , -1.15124 , -1.11947 , ...,  0.87177 ,  1.04045 ,\n          0.944458],\n        [-1.23871 , -1.14033 , -1.48308 , ...,  0.895505,  1.1146  ,\n          0.996843]],\n\n       [[-1.21437 , -1.11281 , -1.31588 , ...,  0.869607,  1.11421 ,\n          0.911445],\n        [-1.23622 , -1.12363 , -1.30974 , ...,  0.940612,  1.11077 ,\n          1.07039 ],\n        [-1.15205 , -1.14246 , -1.21216 , ...,  0.914147,  1.09383 ,\n          1.10271 ],\n        ...,\n        [-1.28744 , -1.10417 , -1.22327 , ...,  0.90872 ,  1.1949  ,\n          1.00884 ],\n        [-1.19644 , -1.16323 , -1.25646 , ...,  0.913467,  1.0723  ,\n          1.06841 ],\n        [-0.985007, -1.08883 , -1.03812 , ...,  0.941881,  1.09928 ,\n          1.24558 ]]])bias_factor(chain, draw)float640.8176 0.924 ... 0.9842 0.9246array([[0.817625, 0.923987, 0.899748, ..., 0.884128, 0.93139 , 0.932311],\n       [0.906387, 0.928054, 0.927321, ..., 0.87962 , 0.849267, 0.883422],\n       [1.02159 , 0.832835, 0.836081, ..., 0.874071, 0.876234, 0.876941],\n       [0.92257 , 0.911505, 0.888924, ..., 0.826528, 0.984242, 0.924604]])volume_noise(chain, draw)float640.09116 0.1455 ... 0.08542 0.1199array([[0.0911635, 0.145511 , 0.0749212, ..., 0.0861592, 0.109185 ,\n        0.121768 ],\n       [0.0967594, 0.0867349, 0.0960768, ..., 0.102509 , 0.0844255,\n        0.0706728],\n       [0.103303 , 0.11546  , 0.0850475, ..., 0.0847808, 0.0715373,\n        0.0814854],\n       [0.092757 , 0.0907435, 0.113077 , ..., 0.0971674, 0.0854233,\n        0.119854 ]])volume(chain, draw, tube)float64182.9 192.1 188.8 ... 782.0 855.2array([[[182.932, 192.136, 188.833, ..., 683.658, 751.418, 636.508],\n        [168.719, 203.878, 198.206, ..., 707.868, 810.057, 695.176],\n        [225.177, 195.948, 199.577, ..., 693.032, 779.132, 801.494],\n        ...,\n        [171.054, 205.624, 189.884, ..., 711.722, 796.6  , 744.505],\n        [211.801, 201.728, 220.963, ..., 720.07 , 789.621, 751.109],\n        [180.602, 198.373, 170.932, ..., 668.484, 774.49 , 809.654]],\n\n       [[181.435, 197.428, 179.585, ..., 676.736, 766.367, 907.524],\n        [193.944, 195.574, 195.609, ..., 681.266, 768.718, 858.258],\n        [191.343, 198.157, 215.955, ..., 695.694, 767.829, 755.286],\n        ...,\n        [179.235, 204.075, 162.611, ..., 709.551, 781.422, 723.426],\n        [189.975, 192.176, 176.443, ..., 714.348, 760.103, 754.809],\n        [186.767, 198.987, 204.561, ..., 705.813, 787.317, 706.511]],\n\n       [[198.651, 208.316, 215.697, ..., 682.152, 770.541, 821.472],\n        [175.866, 190.675, 167.71 , ..., 713.088, 788.866, 644.97 ],\n        [185.411, 195.48 , 169.569, ..., 690.813, 788.233, 665.731],\n        ...,\n        [181.834, 196.062, 206.313, ..., 680.399, 770.813, 754.506],\n        [173.941, 197.288, 201.161, ..., 680.335, 754.311, 711.279],\n        [187.005, 198.61 , 161.032, ..., 690.288, 789.322, 734.448]],\n\n       [[189.812, 201.983, 178.38 , ..., 679.435, 789.135, 697.054],\n        [187.291, 200.65 , 179.052, ..., 709.606, 787.476, 768.257],\n        [197.19 , 198.351, 190.069, ..., 698.207, 779.357, 783.604],\n        ...,\n        [181.511, 203.053, 188.781, ..., 695.893, 829.077, 739.861],\n        [191.905, 195.846, 184.986, ..., 697.917, 769.157, 767.329],\n        [218.412, 204.968, 211.428, ..., 710.158, 781.96 , 855.193]]])Indexes: (4)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))volume_ls_dim_0PandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='volume_ls_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))Attributes: (4)created_at :2024-03-25T16:52:42.632627arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  posterior_predictive\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 264kB\nDimensions:  (chain: 4, draw: 1000, obs: 8)\nCoordinates:\n  * chain    (chain) int64 32B 0 1 2 3\n  * draw     (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999\n  * obs      (obs) int64 64B 1 4 9 11 15 19 21 22\nData variables:\n    y_rep    (chain, draw, obs) float64 256kB 197.1 177.7 290.6 ... 693.9 779.7\nAttributes:\n    created_at:                 2024-03-25T16:52:42.638351\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000obs: 8Coordinates: (3)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])Data variables: (1)y_rep(chain, draw, obs)float64197.1 177.7 290.6 ... 693.9 779.7array([[[197.116, 177.676, 290.624, ..., 817.525, 684.834, 768.599],\n        [207.123, 178.881, 276.029, ..., 837.783, 707.797, 799.808],\n        [193.412, 176.359, 312.075, ..., 779.207, 670.578, 748.349],\n        ...,\n        [205.396, 181.619, 282.37 , ..., 823.026, 693.464, 776.76 ],\n        [196.261, 179.431, 277.182, ..., 842.408, 693.157, 799.594],\n        [194.616, 187.719, 282.712, ..., 827.156, 681.762, 760.655]],\n\n       [[194.801, 174.147, 275.591, ..., 837.393, 662.098, 793.396],\n        [196.473, 175.265, 281.09 , ..., 849.397, 685.798, 799.947],\n        [188.093, 169.629, 275.093, ..., 851.105, 699.691, 758.693],\n        ...,\n        [199.97 , 181.119, 276.119, ..., 885.826, 721.481, 780.279],\n        [190.317, 177.701, 271.189, ..., 838.828, 746.589, 728.727],\n        [204.84 , 172.027, 285.739, ..., 834.172, 716.608, 793.402]],\n\n       [[208.713, 177.249, 294.043, ..., 808.862, 703.655, 795.465],\n        [195.615, 175.848, 270.427, ..., 868.339, 708.692, 818.016],\n        [196.822, 173.831, 290.472, ..., 826.401, 681.787, 797.722],\n        ...,\n        [193.524, 174.954, 287.664, ..., 903.736, 662.875, 770.548],\n        [198.162, 176.923, 293.263, ..., 855.101, 686.138, 768.147],\n        [195.309, 174.313, 283.913, ..., 858.254, 697.01 , 809.608]],\n\n       [[206.681, 178.897, 283.43 , ..., 877.47 , 671.592, 773.122],\n        [204.176, 182.882, 273.709, ..., 802.932, 712.307, 766.771],\n        [198.835, 175.814, 295.483, ..., 807.124, 708.253, 755.885],\n        ...,\n        [199.387, 166.869, 282.976, ..., 836.791, 687.756, 831.191],\n        [189.255, 178.416, 288.503, ..., 861.03 , 712.108, 784.301],\n        [207.587, 184.839, 288.502, ..., 839.977, 693.941, 779.685]]])Indexes: (3)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))Attributes: (4)created_at :2024-03-25T16:52:42.638351arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  log_likelihood\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 264kB\nDimensions:     (chain: 4, draw: 1000, llik_dim_0: 8)\nCoordinates:\n  * chain       (chain) int64 32B 0 1 2 3\n  * draw        (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999\n  * llik_dim_0  (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7\nData variables:\n    llik        (chain, draw, llik_dim_0) float64 256kB -3.875 -2.193 ... -3.675\nAttributes:\n    created_at:                 2024-03-25T16:52:42.680048\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000llik_dim_0: 8Coordinates: (3)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])llik_dim_0(llik_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])Data variables: (1)llik(chain, draw, llik_dim_0)float64-3.875 -2.193 ... -4.218 -3.675array([[[-3.87496, -2.19298, -5.05717, ..., -4.10576, -3.82668,\n         -5.84213],\n        [-3.0104 , -2.18047, -2.95307, ..., -3.76203, -4.04436,\n         -5.07012],\n        [-2.61437, -2.19405, -4.63408, ..., -3.87112, -3.55132,\n         -3.70786],\n        ...,\n        [-3.60909, -2.48505, -2.6706 , ..., -3.91375, -4.35139,\n         -4.01956],\n        [-2.51933, -2.83159, -2.7908 , ..., -3.79013, -5.25984,\n         -3.74878],\n        [-2.31632, -4.73471, -3.46818, ..., -5.85992, -5.29236,\n         -3.8341 ]],\n\n       [[-2.38707, -2.3283 , -2.64878, ..., -3.83624, -4.33519,\n         -4.27453],\n        [-2.69467, -2.22771, -2.68636, ..., -3.96266, -3.97258,\n         -4.11793],\n        [-2.32746, -2.27031, -4.41154, ..., -4.53482, -3.55743,\n         -4.17434],\n...\n        [-2.59181, -2.25385, -2.67099, ..., -4.91015, -4.03322,\n         -3.99848],\n        [-2.40251, -2.43096, -4.67866, ..., -4.25938, -4.03786,\n         -5.46025],\n        [-2.30753, -3.00209, -2.83217, ..., -3.75844, -3.58368,\n         -3.74147]],\n\n       [[-2.56303, -2.25859, -3.31605, ..., -3.90128, -4.10547,\n         -3.73708],\n        [-2.37797, -2.74359, -4.55847, ..., -4.38751, -4.17394,\n         -3.70423],\n        [-2.31732, -2.66832, -4.37736, ..., -3.80292, -3.59671,\n         -3.70401],\n        ...,\n        [-2.78946, -2.22266, -2.67603, ..., -3.75926, -3.55935,\n         -7.68485],\n        [-2.63537, -2.2518 , -3.67463, ..., -4.03551, -3.59052,\n         -4.09134],\n        [-3.36342, -2.82215, -3.077  , ..., -3.84831, -4.21812,\n         -3.67479]]])Indexes: (3)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))llik_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))Attributes: (4)created_at :2024-03-25T16:52:42.680048arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  sample_stats\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 204kB\nDimensions:          (chain: 4, draw: 1000)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\nData variables:\n    lp               (chain, draw) float64 32kB 21.01 18.71 ... 24.94 22.19\n    acceptance_rate  (chain, draw) float64 32kB 0.9988 0.7904 ... 0.8739 0.9229\n    step_size        (chain, draw) float64 32kB 0.3503 0.3503 ... 0.4368 0.4368\n    tree_depth       (chain, draw) int64 32kB 3 4 3 3 3 3 3 3 ... 3 3 3 3 3 3 4\n    n_steps          (chain, draw) int64 32kB 7 31 7 15 7 7 7 ... 7 7 7 7 7 7 15\n    diverging        (chain, draw) bool 4kB False False False ... False False\n    energy           (chain, draw) float64 32kB -6.192 -8.972 ... -15.39 -11.6\nAttributes:\n    created_at:                 2024-03-25T16:52:42.636778\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000Coordinates: (2)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])Data variables: (7)lp(chain, draw)float6421.01 18.71 28.67 ... 24.94 22.19array([[21.009 , 18.71  , 28.6744, ..., 24.9497, 21.3791, 18.4294],\n       [27.8086, 32.2878, 26.6081, ..., 26.1025, 28.0347, 28.8471],\n       [24.4864, 25.8458, 31.4533, ..., 31.4475, 30.7953, 30.7299],\n       [30.3395, 21.0069, 20.8335, ..., 24.7186, 24.9444, 22.194 ]])acceptance_rate(chain, draw)float640.9988 0.7904 1.0 ... 0.8739 0.9229array([[0.998827, 0.790351, 1.      , ..., 0.935358, 0.971577, 0.847244],\n       [0.979632, 0.984735, 0.751846, ..., 1.      , 0.901231, 0.83027 ],\n       [0.64411 , 0.901806, 0.983725, ..., 0.809004, 0.692338, 0.828195],\n       [0.931977, 0.618653, 0.940008, ..., 0.973274, 0.873885, 0.922928]])step_size(chain, draw)float640.3503 0.3503 ... 0.4368 0.4368array([[0.350264, 0.350264, 0.350264, ..., 0.350264, 0.350264, 0.350264],\n       [0.343679, 0.343679, 0.343679, ..., 0.343679, 0.343679, 0.343679],\n       [0.326612, 0.326612, 0.326612, ..., 0.326612, 0.326612, 0.326612],\n       [0.43677 , 0.43677 , 0.43677 , ..., 0.43677 , 0.43677 , 0.43677 ]])tree_depth(chain, draw)int643 4 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 4array([[3, 4, 3, ..., 4, 4, 4],\n       [3, 3, 4, ..., 4, 4, 3],\n       [4, 4, 4, ..., 3, 4, 3],\n       [3, 3, 3, ..., 3, 3, 4]])n_steps(chain, draw)int647 31 7 15 7 7 7 ... 7 7 7 7 7 7 15array([[ 7, 31,  7, ..., 15, 15, 15],\n       [ 7,  7, 15, ..., 15, 31,  7],\n       [15, 15, 15, ...,  7, 15, 15],\n       [ 7,  7,  7, ...,  7,  7, 15]])diverging(chain, draw)boolFalse False False ... False Falsearray([[False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False]])energy(chain, draw)float64-6.192 -8.972 ... -15.39 -11.6array([[ -6.19192,  -8.97176, -14.139  , ..., -11.2611 , -12.0468 ,\n         -1.39412],\n       [-13.6371 , -23.0408 , -15.4574 , ...,  -7.31748, -10.7674 ,\n        -13.1799 ],\n       [-14.2463 , -12.2545 , -20.9744 , ..., -15.1772 , -12.6979 ,\n        -18.2411 ],\n       [-17.2553 , -11.9235 ,  -9.46781, ..., -16.5739 , -15.3934 ,\n        -11.6042 ]])Indexes: (2)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))Attributes: (4)created_at :2024-03-25T16:52:42.636778arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  prior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 2MB\nDimensions:          (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24,\n                      obs: 8, llik_dim_0: 8)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\n  * volume_ls_dim_0  (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23\n  * tube             (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23\n  * obs              (obs) int64 64B 1 4 9 11 15 19 21 22\n  * llik_dim_0       (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7\nData variables:\n    volume_noise_s   (chain, draw) float64 32kB 0.02974 0.02728 ... 0.2746\n    bias_factor_l    (chain, draw) float64 32kB -0.1025 -0.07485 ... -0.1921\n    volume_ls        (chain, draw, volume_ls_dim_0) float64 768kB -1.18 ... 0...\n    bias_factor      (chain, draw) float64 32kB 0.9026 0.9279 ... 0.9219 0.8252\n    volume_noise     (chain, draw) float64 32kB 0.0182 0.01669 ... 0.1236 0.168\n    volume           (chain, draw, tube) float64 768kB 193.8 185.0 ... 693.1\n    y_rep            (chain, draw, obs) float64 256kB 184.8 189.1 ... 1.002e+03\n    llik             (chain, draw, llik_dim_0) float64 256kB -8.981 ... -87.32\nAttributes:\n    created_at:                 2024-03-25T16:52:42.674317\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000volume_ls_dim_0: 24tube: 24obs: 8llik_dim_0: 8Coordinates: (6)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])volume_ls_dim_0(volume_ls_dim_0)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])llik_dim_0(llik_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])Data variables: (8)volume_noise_s(chain, draw)float640.02974 0.02728 ... 0.2021 0.2746array([[0.0297445, 0.0272758, 0.0265095, ..., 0.178736 , 0.192878 ,\n        0.143125 ],\n       [0.0217743, 0.0275153, 0.0318496, ..., 0.187067 , 0.213967 ,\n        0.2151   ],\n       [0.14644  , 0.161344 , 0.154124 , ..., 0.222906 , 0.210747 ,\n        0.183849 ],\n       [0.196802 , 0.224444 , 0.332829 , ..., 0.104148 , 0.202053 ,\n        0.274571 ]])bias_factor_l(chain, draw)float64-0.1025 -0.07485 ... -0.1921array([[-0.102507 , -0.074851 , -0.071183 , ..., -0.0135833, -0.072937 ,\n        -0.0888987],\n       [-0.131403 , -0.137815 , -0.131853 , ..., -0.0773899,  0.0578722,\n        -0.0226735],\n       [ 0.134711 ,  0.0641626, -0.0368078, ..., -0.277545 , -0.22355  ,\n        -0.29311  ],\n       [-0.0320055, -0.127452 ,  0.0724585, ..., -0.0552708, -0.0813462,\n        -0.192088 ]])volume_ls(chain, draw, volume_ls_dim_0)float64-1.18 -1.256 ... 1.525 0.9022array([[[-1.18014 , -1.25596 , -1.21539 , ...,  1.06275 ,  1.04836 ,\n          1.03579 ],\n        [-1.24896 , -1.18338 , -1.21905 , ...,  1.03778 ,  1.05845 ,\n          1.05922 ],\n        [-1.18062 , -1.20861 , -1.20187 , ...,  1.06469 ,  1.05634 ,\n          1.01356 ],\n        ...,\n        [-1.06469 , -1.19225 , -1.39826 , ...,  0.878588,  1.39824 ,\n          1.34754 ],\n        [-1.43398 , -1.40853 , -1.21943 , ...,  1.50548 ,  0.928725,\n          0.986833],\n        [-1.19609 , -1.28651 , -1.26602 , ...,  0.972567,  1.07923 ,\n          1.10071 ]],\n\n       [[-1.24692 , -1.24356 , -1.21965 , ...,  1.00025 ,  0.983762,\n          1.02885 ],\n        [-1.28328 , -1.29047 , -1.25435 , ...,  1.07908 ,  0.979001,\n          0.99518 ],\n        [-1.24983 , -1.2417  , -1.28149 , ...,  0.913942,  1.01969 ,\n          1.0017  ],\n...\n        [-1.28413 , -1.1961  , -1.73514 , ...,  1.07113 ,  1.08167 ,\n          0.796746],\n        [-1.53629 , -1.42865 , -1.07195 , ...,  0.73767 ,  0.848461,\n          1.2325  ],\n        [-1.33974 , -1.18125 , -1.43554 , ...,  0.932523,  0.996232,\n          0.864258]],\n\n       [[-1.04915 , -0.949361, -0.929134, ...,  1.15517 ,  0.924563,\n          0.388474],\n        [-1.26699 , -1.06466 , -1.12933 , ...,  0.873924,  0.755863,\n          1.21534 ],\n        [-1.1652  , -1.53275 , -1.46456 , ...,  1.52952 ,  1.5088  ,\n          1.27139 ],\n        ...,\n        [-1.14013 , -1.28078 , -1.14028 , ...,  1.01727 ,  1.10426 ,\n          0.960508],\n        [-1.4791  , -1.09886 , -1.39305 , ...,  0.918405,  1.08038 ,\n          1.22328 ],\n        [-1.2026  , -1.02384 , -1.02024 , ...,  1.01766 ,  1.52503 ,\n          0.902215]]])bias_factor(chain, draw)float640.9026 0.9279 ... 0.9219 0.8252array([[0.902572, 0.927882, 0.931291, ..., 0.986509, 0.929659, 0.914938],\n       [0.876865, 0.87126 , 0.87647 , ..., 0.925529, 1.05958 , 0.977582],\n       [1.14421 , 1.06627 , 0.963861, ..., 0.757641, 0.799675, 0.74594 ],\n       [0.968501, 0.880336, 1.07515 , ..., 0.946229, 0.921874, 0.825234]])volume_noise(chain, draw)float640.0182 0.01669 ... 0.1236 0.168array([[0.0182013, 0.0166906, 0.0162217, ..., 0.109372 , 0.118026 ,\n        0.0875808],\n       [0.0133241, 0.0168372, 0.0194894, ..., 0.11447  , 0.130931 ,\n        0.131624 ],\n       [0.0896095, 0.0987295, 0.0943116, ..., 0.136401 , 0.12896  ,\n        0.112501 ],\n       [0.120427 , 0.137342 , 0.203665 , ..., 0.0637301, 0.12364  ,\n        0.168016 ]])volume(chain, draw, tube)float64193.8 185.0 ... 1.015e+03 693.1array([[[ 193.83 ,  185.043,  189.693, ...,  764.673,  757.97 ,\n          752.164],\n        [ 185.837,  193.446,  189.27 , ...,  753.078,  762.666,\n          763.023],\n        [ 193.772,  190.482,  191.27 , ...,  765.581,  761.682,\n          742.002],\n        ...,\n        [ 208.018,  192.399,  169.611, ...,  683.179,  938.933,\n          910.251],\n        [ 165.943,  168.549,  189.225, ..., 1002.61 ,  704.464,\n          729.963],\n        [ 191.947,  181.615,  183.906, ...,  723.619,  772.422,\n          782.645]],\n\n       [[ 186.069,  186.452,  189.2  , ...,  735.982,  728.593,\n          748.973],\n        [ 181.974,  181.175,  185.225, ...,  772.355,  726.473,\n          733.701],\n        [ 185.738,  186.664,  182.174, ...,  698.12 ,  744.787,\n          736.633],\n...\n        [ 181.879,  191.946,  138.015, ...,  768.607,  773.58 ,\n          649.808],\n        [ 155.873,  166.485,  207.096, ...,  626.737,  670.7  ,\n          848.373],\n        [ 175.795,  193.698,  165.786, ...,  706.103,  734.174,\n          677.215]],\n\n       [[ 210.005,  223.229,  226.009, ...,  809.167,  702.672,\n          506.157],\n        [ 183.797,  208.022,  199.951, ...,  681.232,  633.753,\n          839.513],\n        [ 195.61 ,  156.211,  162.867, ..., 1017.47 , 1004.65 ,\n          868.806],\n        ...,\n        [ 198.634,  182.253,  198.615, ...,  743.688,  784.348,\n          718.299],\n        [ 161.425,  203.714,  170.153, ...,  700.029,  772.966,\n          843.605],\n        [ 191.183,  213.283,  213.753, ...,  743.862, 1014.68 ,\n          693.128]]])y_rep(chain, draw, obs)float64184.8 189.1 ... 741.0 1.002e+03array([[[ 184.813,  189.054,  386.057, ...,  759.48 ,  754.107,\n          752.463],\n        [ 188.057,  194.853,  377.611, ...,  747.653,  741.944,\n          791.905],\n        [ 191.471,  191.019,  380.344, ...,  762.074,  750.234,\n          764.223],\n        ...,\n        [ 188.237,  217.168,  422.702, ...,  804.179,  687.31 ,\n          936.802],\n        [ 167.074,  207.593,  425.475, ...,  681.052, 1021.48 ,\n          705.951],\n        [ 177.321,  174.68 ,  365.093, ...,  756.83 ,  745.255,\n          747.625]],\n\n       [[ 194.119,  186.317,  365.097, ...,  722.354,  724.962,\n          745.609],\n        [ 183.759,  188.886,  368.915, ...,  750.265,  771.006,\n          708.641],\n        [ 188.523,  182.887,  388.006, ...,  759.472,  700.728,\n          735.546],\n...\n        [ 187.093,  217.335,  268.847, ...,  600.192,  760.567,\n          755.355],\n        [ 165.12 ,  131.573,  447.16 , ...,  730.651,  634.832,\n          670.844],\n        [ 194.342,  155.376,  278.762, ...,  557.058,  670.72 ,\n          756.81 ]],\n\n       [[ 227.482,  215.861,  369.136, ...,  876.322,  807.167,\n          691.492],\n        [ 204.28 ,  253.91 ,  468.703, ...,  851.316,  673.234,\n          629.031],\n        [ 158.707,  185.01 ,  421.974, ..., 1219.31 , 1041.13 ,\n          994.95 ],\n        ...,\n        [ 181.02 ,  216.746,  403.83 , ...,  735.791,  757.416,\n          787.206],\n        [ 204.988,  169.965,  321.8  , ...,  875.623,  693.14 ,\n          777.141],\n        [ 215.283,  205.664,  494.475, ...,  677.682,  740.974,\n         1001.79 ]]])llik(chain, draw, llik_dim_0)float64-8.981 -8.16 ... -9.587 -87.32array([[[  -8.98149,   -8.16041, -121.834  , ...,  -26.6511 ,\n          -15.333  ,   -5.03173],\n        [  -3.3298 ,   -6.74602, -120.175  , ...,  -18.3798 ,\n          -11.9159 ,   -4.56985],\n        [  -4.73572,  -11.8407 , -113.21   , ...,  -18.8408 ,\n          -15.6229 ,   -4.65854],\n        ...,\n        [  -3.75606,  -57.2464 , -186.1    , ...,   -4.7875 ,\n           -3.85341,  -44.6675 ],\n        [ -36.9401 ,  -34.6153 , -209.275  , ...,  -58.3687 ,\n         -172.836  ,  -17.7702 ],\n        [ -12.8355 ,   -2.24944,  -85.4265 , ...,  -17.943  ,\n           -5.74474,   -3.91947]],\n\n       [[  -7.66673,   -5.33243,  -94.05   , ...,  -33.5105 ,\n           -7.87838,  -10.2459 ],\n        [ -13.3995 ,   -6.43127,  -83.7495 , ...,  -37.2591 ,\n          -17.8842 ,  -10.7848 ],\n        [  -7.4818 ,   -3.62121,  -98.3658 , ...,  -24.9407 ,\n           -3.5948 ,   -6.86394],\n...\n        [  -3.96411,  -65.3276 ,   -6.23833, ..., -161.946  ,\n          -16.6114 ,   -3.86945],\n        [ -42.2559 ,  -75.7815 , -300.021  , ...,  -44.6827 ,\n          -16.5108 ,  -33.8265 ],\n        [  -3.23865,  -16.1718 ,   -4.7924 , ..., -249.053  ,\n           -3.92796,   -8.93514]],\n\n       [[ -18.6896 ,  -57.0662 , -102.733  , ...,   -4.63053,\n          -33.0595 ,  -18.4545 ],\n        [  -4.71529, -167.057  , -299.696  , ...,   -5.00925,\n           -3.97491,  -59.8419 ],\n        [ -75.8    ,   -3.83045, -201.039  , ..., -167.683  ,\n         -186.641  ,  -81.0174 ],\n        ...,\n        [ -12.046  ,  -44.6544 , -129.648  , ...,  -32.7041 ,\n           -9.54657,   -3.67239],\n        [  -2.96314,   -4.2949 ,  -27.2165 , ...,   -4.11057,\n           -3.64532,   -3.89527],\n        [  -8.2396 ,  -22.3636 , -384.246  , ...,  -74.6057 ,\n           -9.58703,  -87.318  ]]])Indexes: (6)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))volume_ls_dim_0PandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='volume_ls_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))llik_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))Attributes: (4)created_at :2024-03-25T16:52:42.674317arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  sample_stats_prior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 204kB\nDimensions:          (chain: 4, draw: 1000)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\nData variables:\n    lp               (chain, draw) float64 32kB 72.09 71.09 ... 23.28 14.77\n    acceptance_rate  (chain, draw) float64 32kB 0.9856 0.9735 ... 0.9808 0.99\n    step_size        (chain, draw) float64 32kB 0.03423 0.03423 ... 0.04274\n    tree_depth       (chain, draw) int64 32kB 4 4 4 4 4 4 4 4 ... 7 6 7 7 6 6 7\n    n_steps          (chain, draw) int64 32kB 15 31 15 15 15 ... 127 63 63 127\n    diverging        (chain, draw) bool 4kB False False False ... False False\n    energy           (chain, draw) float64 32kB -58.38 -61.21 ... -14.09 -3.941\nAttributes:\n    created_at:                 2024-03-25T16:52:42.676922\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000Coordinates: (2)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])Data variables: (7)lp(chain, draw)float6472.09 71.09 72.54 ... 23.28 14.77array([[72.0913, 71.0934, 72.5368, ..., 22.7741, 19.717 , 31.5704],\n       [78.6388, 67.3879, 66.6947, ..., 27.7493, 23.3811, 21.9996],\n       [32.2751, 29.5342, 31.5931, ..., 21.5172, 20.5746, 25.7496],\n       [19.4916, 19.317 , 10.9252, ..., 36.2183, 23.2776, 14.7731]])acceptance_rate(chain, draw)float640.9856 0.9735 ... 0.9808 0.99array([[0.985585, 0.973493, 0.926252, ..., 0.999961, 0.998114, 0.999975],\n       [0.987531, 0.826221, 0.990511, ..., 0.999879, 0.999778, 0.999777],\n       [0.989942, 0.996745, 0.999915, ..., 0.997515, 0.999232, 0.999938],\n       [0.999604, 0.99648 , 0.998834, ..., 0.999231, 0.980821, 0.99004 ]])step_size(chain, draw)float640.03423 0.03423 ... 0.04274 0.04274array([[0.0342255, 0.0342255, 0.0342255, ..., 0.0342255, 0.0342255,\n        0.0342255],\n       [0.0211665, 0.0211665, 0.0211665, ..., 0.0211665, 0.0211665,\n        0.0211665],\n       [0.0399343, 0.0399343, 0.0399343, ..., 0.0399343, 0.0399343,\n        0.0399343],\n       [0.0427424, 0.0427424, 0.0427424, ..., 0.0427424, 0.0427424,\n        0.0427424]])tree_depth(chain, draw)int644 4 4 4 4 4 4 4 ... 7 7 6 7 7 6 6 7array([[4, 4, 4, ..., 7, 7, 7],\n       [5, 5, 5, ..., 8, 8, 8],\n       [6, 7, 6, ..., 7, 7, 7],\n       [7, 7, 7, ..., 6, 6, 7]])n_steps(chain, draw)int6415 31 15 15 15 ... 127 63 63 127array([[ 15,  31,  15, ..., 127, 255, 127],\n       [ 31,  31,  31, ..., 255, 255, 255],\n       [ 63, 255,  63, ..., 127, 127, 127],\n       [127, 127, 127, ...,  63,  63, 127]])diverging(chain, draw)boolFalse False False ... False Falsearray([[False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False]])energy(chain, draw)float64-58.38 -61.21 ... -14.09 -3.941array([[-5.83823e+01, -6.12113e+01, -6.32211e+01, ..., -1.00715e+01,\n         3.57843e-02, -1.26730e+01],\n       [-7.05147e+01, -5.28132e+01, -5.87134e+01, ..., -1.31417e+01,\n        -1.17634e+01, -5.75708e+00],\n       [-2.44030e+01, -1.81596e+01, -2.17799e+01, ..., -9.25289e+00,\n        -6.95541e+00, -9.89278e+00],\n       [-5.78982e+00, -5.81860e+00, -6.76507e-01, ..., -1.92249e+01,\n        -1.40883e+01, -3.94102e+00]])Indexes: (2)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))Attributes: (4)created_at :2024-03-25T16:52:42.676922arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  observed_data\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 896B\nDimensions:               (N_dim_0: 1, N_cal_dim_0: 1, tube: 24, obs: 8,\n                           measurement_ix_dim_0: 8, cal_error_dim_0: 1,\n                           likelihood_dim_0: 1)\nCoordinates:\n  * N_dim_0               (N_dim_0) int64 8B 0\n  * N_cal_dim_0           (N_cal_dim_0) int64 8B 0\n  * tube                  (tube) int64 192B 0 1 2 3 4 5 6 ... 18 19 20 21 22 23\n  * obs                   (obs) int64 64B 1 4 9 11 15 19 21 22\n  * measurement_ix_dim_0  (measurement_ix_dim_0) int64 64B 0 1 2 3 4 5 6 7\n  * cal_error_dim_0       (cal_error_dim_0) int64 8B 0\n  * likelihood_dim_0      (likelihood_dim_0) int64 8B 0\nData variables:\n    N                     (N_dim_0) int64 8B 24\n    N_cal                 (N_cal_dim_0) int64 8B 8\n    target_volume         (tube) int64 192B 200 200 200 200 ... 800 800 800 800\n    y                     (obs) float64 64B 199.1 176.3 281.9 ... 693.9 783.4\n    measurement_ix        (measurement_ix_dim_0) int64 64B 2 5 10 12 16 20 22 23\n    cal_error             (cal_error_dim_0) float64 8B 0.02\n    likelihood            (likelihood_dim_0) int64 8B 1\n    true_volume           (tube) float64 192B 152.6 199.7 161.3 ... 770.5 598.6\nAttributes:\n    created_at:                 2024-03-25T16:52:42.678952\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:N_dim_0: 1N_cal_dim_0: 1tube: 24obs: 8measurement_ix_dim_0: 8cal_error_dim_0: 1likelihood_dim_0: 1Coordinates: (7)N_dim_0(N_dim_0)int640array([0])N_cal_dim_0(N_cal_dim_0)int640array([0])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])measurement_ix_dim_0(measurement_ix_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])cal_error_dim_0(cal_error_dim_0)int640array([0])likelihood_dim_0(likelihood_dim_0)int640array([0])Data variables: (8)N(N_dim_0)int6424array([24])N_cal(N_cal_dim_0)int648array([8])target_volume(tube)int64200 200 200 200 ... 800 800 800 800array([200, 200, 200, 200, 200, 200, 200, 200, 400, 400, 400, 400, 400,\n       400, 400, 400, 800, 800, 800, 800, 800, 800, 800, 800])y(obs)float64199.1 176.3 281.9 ... 693.9 783.4array([199.07727295, 176.25632772, 281.87757567, 387.16351159,\n       362.14946827, 853.23878527, 693.91934176, 783.39963414])measurement_ix(measurement_ix_dim_0)int642 5 10 12 16 20 22 23array([ 2,  5, 10, 12, 16, 20, 22, 23])cal_error(cal_error_dim_0)float640.02array([0.02])likelihood(likelihood_dim_0)int641array([1])true_volume(tube)float64152.6 199.7 161.3 ... 770.5 598.6array([152.64294349, 199.70810807, 161.32449302, 171.49715397,\n       174.67894069, 163.43175945, 153.50063823, 187.79919405,\n       364.94147086, 289.55488638, 445.13256694, 387.79655766,\n       326.2592979 , 385.23402356, 335.94110379, 349.87019831,\n       761.78380169, 620.86368119, 745.73037525, 809.71007835,\n       803.52489045, 683.21425317, 770.52360505, 598.61585552])Indexes: (7)N_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='N_dim_0'))N_cal_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='N_cal_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))measurement_ix_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='measurement_ix_dim_0'))cal_error_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='cal_error_dim_0'))likelihood_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='likelihood_dim_0'))Attributes: (4)created_at :2024-03-25T16:52:42.678952arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n              \n            \n            \n\n\nNext we look at the summaries of both the posterior and prior.\n\nfor group_name in [\"prior\", \"posterior\"]:\n    group = idata.get(group_name)\n    group_summary = az.summary(\n        group,\n        var_names=[\n            \"volume_noise\", \"bias_factor\", \"volume_noise_s\", \"bias_factor_l\"\n        ]\n    )\n    display(group_summary)\n\n\n\n\n\n\n\n\n\nmean\nsd\nhdi_3%\nhdi_97%\nmcse_mean\nmcse_sd\ness_bulk\ness_tail\nr_hat\n\n\n\n\nvolume_noise\n0.066\n0.034\n0.016\n0.128\n0.002\n0.001\n224.0\n380.0\n1.01\n\n\nbias_factor\n0.994\n0.143\n0.731\n1.252\n0.015\n0.010\n80.0\n120.0\n1.06\n\n\nvolume_noise_s\n0.108\n0.055\n0.026\n0.210\n0.003\n0.002\n224.0\n380.0\n1.01\n\n\nbias_factor_l\n-0.016\n0.141\n-0.286\n0.242\n0.015\n0.011\n80.0\n120.0\n1.06\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nmean\nsd\nhdi_3%\nhdi_97%\nmcse_mean\nmcse_sd\ness_bulk\ness_tail\nr_hat\n\n\n\n\nvolume_noise\n0.113\n0.026\n0.068\n0.160\n0.001\n0.001\n1303.0\n1943.0\n1.0\n\n\nbias_factor\n0.887\n0.055\n0.781\n0.986\n0.001\n0.001\n2493.0\n2380.0\n1.0\n\n\nvolume_noise_s\n0.185\n0.043\n0.111\n0.261\n0.001\n0.001\n1303.0\n1943.0\n1.0\n\n\nbias_factor_l\n-0.122\n0.061\n-0.229\n0.002\n0.001\n0.001\n2493.0\n2380.0\n1.0\n\n\n\n\n\n\n\n\nThis plot compares the measurements with the observations.\n\naz.plot_lm(\n    y=idata.observed_data[\"y\"],\n    x=idata.observed_data[\"tube\"].sel(tube=MEASUREMENT_IX + 1),\n    y_hat=idata.posterior_predictive[\"y_rep\"]\n)\nax = plt.gca()\nax.semilogy()\n\n\n\n\n\n\n\n\nThis plot compares the volume_noise and bias_factor samples with the true values that we used to simulate the data.\n\naz.plot_posterior(\n  idata.prior,\n  var_names=[\"volume_noise\", \"bias_factor\"],\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  figsize=[12, 4]\n)\nf = plt.gcf()\naxes = f.axes\naz.plot_posterior(\n  idata.posterior,\n  var_names=[\"volume_noise\", \"bias_factor\"],\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  figsize=[12, 4],\n  ax=axes,\n  color=\"tab:orange\"\n)\nfor ax, truth in zip(f.axes, [NOISE, BIAS_FACTOR]):\n    ax.axvline(truth, color=\"red\")\n\n\n\n\n\n\n\n\nThis plot shows the samples for all the tubes’ volumes, including those that weren’t measured, alongside the true volumes.\n\naz.plot_lm(\n    x=idata.observed_data[\"tube\"],\n    y=idata.observed_data[\"true_volume\"],\n    y_hat=idata.posterior[\"volume\"],\n    grid=False,\n    y_kwargs={\"label\": \"true volume\"},\n    figsize=[10, 5],\n    legend=False,\n)\nax = plt.gca()\nfor i in MEASUREMENT_IX:\n    ax.text(i+0.1, volumes[i], \"obs\", zorder=1000)\nax.set(xlabel=\"tube\", ylabel=\"volume ($\\\\mu$l)\");\nax.semilogy()\n\n\n\n\n\n\n\n\nSo, what is the probability that Teddy put less than 350 \\(\\mu\\)l of label into tube 10, even though the target amount was 400\\(\\mu\\)l?\n\naz.plot_posterior(\n  idata.prior,\n  var_names=[\"volume\"],\n  coords={\"tube\": [10]},\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  ref_val=350,\n  figsize=[12, 4],\n  bins=np.linspace(250, 600, 30),\n)\n\n\n\n\n\n\n\n\nPhew, only about 13%, that’s probably fine right?",
+    "text": "Example\nTeddy has never been in the lab and is bad at pipetting. Unfortunately, some label needed to be put in some Eppendorf tubes, and noone else was available to do it themselves or even supervise.\nEach tube had a required volume of label which Teddy tried to hit, but probably there was some inaccuracy due to bad eyesight, poor hand control or something.\nIn addition, there was also probably some consistent bias, as the pipette was consistently adding too much or too little liquid. It seemed pretty old.\nLuckily, someone was able to measure how much label ended up in 8 out of the 24 tubes with pretty good accuracy. Now we want to estimate how much label there is in the other 16 tubes, taking into account these measurements as well as what we know about the likely biased pipette and inconsistent pipetter.\nTo describe this situation we’ll first think of a regression model that describes the measurement setup, then use Python to simulate data from the model given some plausible parameter values. Next we’ll implement the model in Stan, then fit the simulated data using MCMC and then analyse the results.\n\nRegression model\nTo model the noise that Teddy introduced by being bad at pipetting and the bias introduced by the bad pipette, we need some parameters that connect the known target volumes with the unknown true volumes. Let’s call them \\(noise\\) and \\(bias\\). Since the volumes are constrained positive, a distribution that automatically excludes negative numbers is probably a good idea: the log-normal distribution is usually a good starting point. This equation describes a plausible relationship:\n\\[\nvolume \\sim LN(\\ln{(target\\cdot bias)}, noise)\n\\]\nTo model the helpful measurements, we use another log-normal distribution and assume the measuring device is unbiased and has known log-scale standard error \\(cal\\ error\\):1\n1 NB the scale parameter of a lognormal distribution represents multiplicative error\\[\nmeasurements \\sim LN(\\ln{volume}, cal\\ error)\n\\]\nTo round off the model we can think about the likely values of the unknown parameters \\(bias\\) and \\(noise\\). \\(bias\\) is likely not to be too far away from 1, otherwise someone would have probably thrown the pipette away already. A prior distribution that puts most of its mass between 0.75 and 1.25 therefore seems reasonable. Similarly, a prior for \\(noise\\) should probably not imply that Teddy’s pipetting errors regularly exceeded 30%. This consideration motivates a prior for \\(noise\\) that puts most of its mass below 0.15.\n\n\nSimulating fake data\nFirst some imports: as usual for this course we’ll be using arviz, matplotlib, cmdstanpy, pandas and numpy. stanio is a handy utility library for Stan: it is a dependency of cmdstanpy so you shouldn’t need to install it explicitly.\n\nimport arviz as az\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport stanio\n\nfrom cmdstanpy import CmdStanModel\n\nNow some hardcoded numbers, including true values for the parameters here: \\(bias\\) is 0.88 and \\(noise\\) is 0.1. Note that \\(cal\\ error\\) is much smaller than \\(bias\\).\n\nN = 24\nN_CAL = 8\nTARGET_VOLUMES = np.array(\n    [\n      *([200] * 8),\n      *([400] * 8),\n      *([800] * 8),\n    ]\n)\nMEASUREMENT_IX = np.array([1, 4, 9, 11, 15, 19, 21, 22])\nCAL_ERROR = 0.02\nBIAS_FACTOR = 0.88\nNOISE = 0.1\nRNG_SEED = 12345\n\nSimulate the true volumes\n\nrng = np.random.default_rng(seed=RNG_SEED)\nln_mean = [\n  np.log(target * BIAS_FACTOR)\n  for target in TARGET_VOLUMES\n]\nvolumes = rng.lognormal(mean=ln_mean, sigma=NOISE)\nvolumes\n\narray([152.64294349, 199.70810807, 161.32449302, 171.49715397,\n       174.67894069, 163.43175945, 153.50063823, 187.79919405,\n       364.94147086, 289.55488638, 445.13256694, 387.79655766,\n       326.2592979 , 385.23402356, 335.94110379, 349.87019831,\n       761.78380169, 620.86368119, 745.73037525, 809.71007835,\n       803.52489045, 683.21425317, 770.52360505, 598.61585552])\n\n\nPlot the volumes and the targets.\n\nf, ax = plt.subplots()\nbins = np.logspace(np.log10(100), np.log10(1000), 30)\nax.hist(volumes, bins=bins)\nfor t in (200, 400, 800):\n    ax.axvline(t, color=\"red\")\nax.semilogx()\nax.set_xticks([200, 400, 800], [200, 400, 800]);\nax.set(\n    xlabel=\"volume ($\\\\mu$l)\",\n    ylabel=\"Frequency\",\n    title=\"How much label ended up in the tubes\"\n);\n\n\n\n\n\n\n\n\nSimulate measurements for tubes in the MEASUREMENT_IX.\n\nmeasurements = [\n  rng.lognormal(np.log(vol), CAL_ERROR)\n  for vol in volumes[MEASUREMENT_IX]\n]\npd.DataFrame({\n  \"target volume\": np.array(TARGET_VOLUMES)[MEASUREMENT_IX], \n  \"actual volume\": volumes[MEASUREMENT_IX],\n  \"measured volume\": measurements\n})\n\n\n\n\n\n\n\n\n\ntarget volume\nactual volume\nmeasured volume\n\n\n\n\n0\n200\n199.708108\n199.077273\n\n\n1\n200\n174.678941\n176.256328\n\n\n2\n400\n289.554886\n281.877576\n\n\n3\n400\n387.796558\n387.163512\n\n\n4\n400\n349.870198\n362.149468\n\n\n5\n800\n809.710078\n853.238785\n\n\n6\n800\n683.214253\n693.919342\n\n\n7\n800\n770.523605\n783.399634\n\n\n\n\n\n\n\n\n\n\nWriting the model in Stan and sampling the simulated data\nI wrote up the implied statistical model in a Stan file at src/stan/ pipette.stan. This code loads this Stan file as a CmdStanModel object, checks its formatting and prints it out.\nNote that the model internally puts the data on log scale and then standardises it: this is a bit annoying but makes it way easier to set priors and can ultimately save you a lot of trouble.\n\nmodel = CmdStanModel(stan_file=\"../src/stan/pipette.stan\")\nmodel.format(overwrite_file=True, canonicalize=True)\nprint(model.code())\n\n18:00:12 - cmdstanpy - INFO - compiling stan file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette.stan to exe file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette\n18:00:17 - cmdstanpy - INFO - compiled model executable: /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette\n\n\nfunctions {\n  vector standardise(vector v, real m, real s) {\n    return (v - m) / s;\n  }\n  real standardise(real v, real m, real s) {\n    return (v - m) / s;\n  }\n  vector unstandardise(vector u, real m, real s) {\n    return m + u * s;\n  }\n  real unstandardise(real u, real m, real s) {\n    return m + u * s;\n  }\n}\ndata {\n  int<lower=1> N;\n  int<lower=0> N_cal;\n  vector<lower=0>[N] target_volume;\n  vector<lower=0>[N_cal] y;\n  array[N_cal] int<lower=1, upper=N> measurement_ix;\n  real<lower=0> cal_error;\n  int<lower=0, upper=1> likelihood;\n}\ntransformed data {\n  vector[N_cal] y_ls = standardise(log(y), mean(log(y)), sd(log(y)));\n  vector[N] target_volume_ls = standardise(log(target_volume), mean(log(y)),\n                                           sd(log(y)));\n  real cal_error_s = cal_error / sd(log(y));\n}\nparameters {\n  real<lower=0> volume_noise_s;\n  real bias_factor_l;\n  vector[N] volume_ls;\n}\nmodel {\n  volume_noise_s ~ lognormal(log(0.1), 0.5);\n  bias_factor_l ~ normal(0, 0.15);\n  volume_ls ~ normal(target_volume_ls + bias_factor_l, volume_noise_s);\n  if (likelihood) {\n    for (i in 1 : N_cal) {\n      y_ls[i] ~ normal(volume_ls[measurement_ix[i]], cal_error_s);\n    }\n  }\n}\ngenerated quantities {\n  real bias_factor = exp(bias_factor_l);\n  real volume_noise = volume_noise_s * sd(log(y));\n  vector[N] volume = exp(unstandardise(volume_ls, mean(log(y)), sd(log(y))));\n  vector[N_cal] y_rep;\n  vector[N_cal] llik;\n  for (i in 1 : N_cal) {\n    int ms_ix = measurement_ix[i];\n    y_rep[i] = lognormal_rng(log(volume[ms_ix]), cal_error);\n    llik[i] = lognormal_lpdf(y[i] | log(volume[ms_ix]), cal_error);\n  }\n}\n\n\n\n\nThis code loads some data into a dictionary that is compatible with Stan and carries out two MCMC runs, one in prior mode and one in posterior mode.\n\nstan_input_posterior = stanio.json.process_dictionary(\n    {\n      \"N\": N,\n      \"N_cal\": N_CAL,\n      \"target_volume\": TARGET_VOLUMES,\n      \"y\": measurements,\n      \"measurement_ix\": MEASUREMENT_IX + 1,\n      \"cal_error\": CAL_ERROR,\n      \"likelihood\": 1,\n  }\n)\nstan_input_prior = stan_input_posterior | {\"likelihood\": 0}\nmcmc_prior = model.sample(\n    data=stan_input_prior,\n    adapt_delta=0.999,\n    max_treedepth=12,\n    seed=RNG_SEED,\n)\nmcmc_posterior = model.sample(data=stan_input_posterior, seed=RNG_SEED)\nmcmc_prior.diagnose()\nmcmc_posterior.diagnose()\n\n18:00:17 - cmdstanpy - INFO - CmdStan start processing\n18:00:17 - cmdstanpy - INFO - CmdStan done processing.\n18:00:17 - cmdstanpy - INFO - CmdStan start processing\n18:00:18 - cmdstanpy - INFO - CmdStan done processing.\n18:00:18 - cmdstanpy - WARNING - Non-fatal error during sampling:\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nException: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)\nConsider re-running with show_console=True if the above output is unclear!\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                                                                                                                                                                                                                                                                                                                                \n                                                                                                                                                                                                                                                                                                                                \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'Processing csv files: /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_1.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_2.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_3.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_4.csv\\n\\nChecking sampler transitions treedepth.\\nTreedepth satisfactory for all transitions.\\n\\nChecking sampler transitions for divergences.\\nNo divergent transitions found.\\n\\nChecking E-BFMI - sampler transitions HMC potential energy.\\nE-BFMI satisfactory.\\n\\nEffective sample size satisfactory.\\n\\nSplit R-hat values satisfactory all parameters.\\n\\nProcessing complete, no problems detected.\\n'\n\n\nThe diagnostics seem ok, though interestingly the prior was pretty tricky to sample accurately.\n\n\nLoading the MCMC results with arviz\nThis code loads both MCMC runs into an arviz InferenceData object.\n\ncoords = {\"obs\": MEASUREMENT_IX, \"tube\": range(N)}\ndims={\n    \"y\": [\"obs\"],\n    \"y_rep\": [\"obs\"],\n    \"target_volume\": [\"tube\"],\n    \"true_volume\": [\"tube\"],\n    \"volume\": [\"tube\"],\n    \"tube\": [\"tube\"]\n}\nidata = az.from_cmdstanpy(\n    posterior=mcmc_posterior,\n    prior=mcmc_prior,\n    log_likelihood=\"llik\",\n    observed_data=stan_input_posterior | {\n        \"true_volume\": volumes, \"tube\": range(N)\n    },\n    posterior_predictive={\"y\": \"y_rep\"},\n    coords=coords,\n    dims=dims\n)\nidata\n\n\n            \n              \n                arviz.InferenceData\n              \n              \n              \n            \n                  \n                  posterior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 2MB\nDimensions:          (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\n  * volume_ls_dim_0  (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23\n  * tube             (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23\nData variables:\n    volume_noise_s   (chain, draw) float64 32kB 0.149 0.2378 ... 0.1396 0.1959\n    bias_factor_l    (chain, draw) float64 32kB -0.2014 -0.07906 ... -0.07839\n    volume_ls        (chain, draw, volume_ls_dim_0) float64 768kB -1.275 ... ...\n    bias_factor      (chain, draw) float64 32kB 0.8176 0.924 ... 0.9842 0.9246\n    volume_noise     (chain, draw) float64 32kB 0.09116 0.1455 ... 0.1199\n    volume           (chain, draw, tube) float64 768kB 182.9 192.1 ... 855.2\nAttributes:\n    created_at:                 2024-03-25T17:00:18.417851\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000volume_ls_dim_0: 24tube: 24Coordinates: (4)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])volume_ls_dim_0(volume_ls_dim_0)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])Data variables: (6)volume_noise_s(chain, draw)float640.149 0.2378 ... 0.1396 0.1959array([[0.148979, 0.237794, 0.122436, ..., 0.140801, 0.178431, 0.198993],\n       [0.158124, 0.141742, 0.157009, ..., 0.167521, 0.137968, 0.115493],\n       [0.168817, 0.188685, 0.138985, ..., 0.138549, 0.116906, 0.133163],\n       [0.151583, 0.148293, 0.18479 , ..., 0.158791, 0.139599, 0.195866]])bias_factor_l(chain, draw)float64-0.2014 -0.07906 ... -0.07839array([[-0.201351 , -0.0790569, -0.105641 , ..., -0.123153 , -0.0710773,\n        -0.0700885],\n       [-0.0982885, -0.0746649, -0.0754556, ..., -0.128265 , -0.163382 ,\n        -0.123952 ],\n       [ 0.0213591, -0.182919 , -0.17903  , ..., -0.134593 , -0.132122 ,\n        -0.131316 ],\n       [-0.0805919, -0.0926586, -0.117743 , ..., -0.190522 , -0.0158839,\n        -0.0783902]])volume_ls(chain, draw, volume_ls_dim_0)float64-1.275 -1.194 ... 1.099 1.246array([[[-1.2747  , -1.19448 , -1.22282 , ...,  0.879734,  1.03417 ,\n          0.762953],\n        [-1.40687 , -1.09754 , -1.14365 , ...,  0.936603,  1.15697 ,\n          0.907036],\n        [-0.935161, -1.16238 , -1.13239 , ...,  0.901988,  1.09336 ,\n          1.1396  ],\n        ...,\n        [-1.38442 , -1.08361 , -1.21375 , ...,  0.945476,  1.12959 ,\n          1.01907 ],\n        [-1.03524 , -1.11487 , -0.966034, ...,  0.964533,  1.11521 ,\n          1.0335  ],\n        [-1.29565 , -1.14228 , -1.38558 , ...,  0.843054,  1.08359 ,\n          1.15616 ]],\n\n       [[-1.28813 , -1.15007 , -1.30488 , ...,  0.863103,  1.06637 ,\n          1.34264 ],\n        [-1.17917 , -1.1655  , -1.16521 , ...,  0.874006,  1.07137 ,\n          1.25143 ],\n        [-1.20124 , -1.14406 , -1.0035  , ...,  0.908254,  1.06948 ,\n          1.04256 ],\n...\n        [-1.28454 , -1.16143 , -1.07814 , ...,  0.871924,  1.07582 ,\n          1.04087 ],\n        [-1.35706 , -1.15124 , -1.11947 , ...,  0.87177 ,  1.04045 ,\n          0.944458],\n        [-1.23871 , -1.14033 , -1.48308 , ...,  0.895505,  1.1146  ,\n          0.996843]],\n\n       [[-1.21437 , -1.11281 , -1.31588 , ...,  0.869607,  1.11421 ,\n          0.911445],\n        [-1.23622 , -1.12363 , -1.30974 , ...,  0.940612,  1.11077 ,\n          1.07039 ],\n        [-1.15205 , -1.14246 , -1.21216 , ...,  0.914147,  1.09383 ,\n          1.10271 ],\n        ...,\n        [-1.28744 , -1.10417 , -1.22327 , ...,  0.90872 ,  1.1949  ,\n          1.00884 ],\n        [-1.19644 , -1.16323 , -1.25646 , ...,  0.913467,  1.0723  ,\n          1.06841 ],\n        [-0.985007, -1.08883 , -1.03812 , ...,  0.941881,  1.09928 ,\n          1.24558 ]]])bias_factor(chain, draw)float640.8176 0.924 ... 0.9842 0.9246array([[0.817625, 0.923987, 0.899748, ..., 0.884128, 0.93139 , 0.932311],\n       [0.906387, 0.928054, 0.927321, ..., 0.87962 , 0.849267, 0.883422],\n       [1.02159 , 0.832835, 0.836081, ..., 0.874071, 0.876234, 0.876941],\n       [0.92257 , 0.911505, 0.888924, ..., 0.826528, 0.984242, 0.924604]])volume_noise(chain, draw)float640.09116 0.1455 ... 0.08542 0.1199array([[0.0911635, 0.145511 , 0.0749212, ..., 0.0861592, 0.109185 ,\n        0.121768 ],\n       [0.0967594, 0.0867349, 0.0960768, ..., 0.102509 , 0.0844255,\n        0.0706728],\n       [0.103303 , 0.11546  , 0.0850475, ..., 0.0847808, 0.0715373,\n        0.0814854],\n       [0.092757 , 0.0907435, 0.113077 , ..., 0.0971674, 0.0854233,\n        0.119854 ]])volume(chain, draw, tube)float64182.9 192.1 188.8 ... 782.0 855.2array([[[182.932, 192.136, 188.833, ..., 683.658, 751.418, 636.508],\n        [168.719, 203.878, 198.206, ..., 707.868, 810.057, 695.176],\n        [225.177, 195.948, 199.577, ..., 693.032, 779.132, 801.494],\n        ...,\n        [171.054, 205.624, 189.884, ..., 711.722, 796.6  , 744.505],\n        [211.801, 201.728, 220.963, ..., 720.07 , 789.621, 751.109],\n        [180.602, 198.373, 170.932, ..., 668.484, 774.49 , 809.654]],\n\n       [[181.435, 197.428, 179.585, ..., 676.736, 766.367, 907.524],\n        [193.944, 195.574, 195.609, ..., 681.266, 768.718, 858.258],\n        [191.343, 198.157, 215.955, ..., 695.694, 767.829, 755.286],\n        ...,\n        [179.235, 204.075, 162.611, ..., 709.551, 781.422, 723.426],\n        [189.975, 192.176, 176.443, ..., 714.348, 760.103, 754.809],\n        [186.767, 198.987, 204.561, ..., 705.813, 787.317, 706.511]],\n\n       [[198.651, 208.316, 215.697, ..., 682.152, 770.541, 821.472],\n        [175.866, 190.675, 167.71 , ..., 713.088, 788.866, 644.97 ],\n        [185.411, 195.48 , 169.569, ..., 690.813, 788.233, 665.731],\n        ...,\n        [181.834, 196.062, 206.313, ..., 680.399, 770.813, 754.506],\n        [173.941, 197.288, 201.161, ..., 680.335, 754.311, 711.279],\n        [187.005, 198.61 , 161.032, ..., 690.288, 789.322, 734.448]],\n\n       [[189.812, 201.983, 178.38 , ..., 679.435, 789.135, 697.054],\n        [187.291, 200.65 , 179.052, ..., 709.606, 787.476, 768.257],\n        [197.19 , 198.351, 190.069, ..., 698.207, 779.357, 783.604],\n        ...,\n        [181.511, 203.053, 188.781, ..., 695.893, 829.077, 739.861],\n        [191.905, 195.846, 184.986, ..., 697.917, 769.157, 767.329],\n        [218.412, 204.968, 211.428, ..., 710.158, 781.96 , 855.193]]])Indexes: (4)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))volume_ls_dim_0PandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='volume_ls_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))Attributes: (4)created_at :2024-03-25T17:00:18.417851arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  posterior_predictive\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 264kB\nDimensions:  (chain: 4, draw: 1000, obs: 8)\nCoordinates:\n  * chain    (chain) int64 32B 0 1 2 3\n  * draw     (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999\n  * obs      (obs) int64 64B 1 4 9 11 15 19 21 22\nData variables:\n    y_rep    (chain, draw, obs) float64 256kB 197.1 177.7 290.6 ... 693.9 779.7\nAttributes:\n    created_at:                 2024-03-25T17:00:18.423820\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000obs: 8Coordinates: (3)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])Data variables: (1)y_rep(chain, draw, obs)float64197.1 177.7 290.6 ... 693.9 779.7array([[[197.116, 177.676, 290.624, ..., 817.525, 684.834, 768.599],\n        [207.123, 178.881, 276.029, ..., 837.783, 707.797, 799.808],\n        [193.412, 176.359, 312.075, ..., 779.207, 670.578, 748.349],\n        ...,\n        [205.396, 181.619, 282.37 , ..., 823.026, 693.464, 776.76 ],\n        [196.261, 179.431, 277.182, ..., 842.408, 693.157, 799.594],\n        [194.616, 187.719, 282.712, ..., 827.156, 681.762, 760.655]],\n\n       [[194.801, 174.147, 275.591, ..., 837.393, 662.098, 793.396],\n        [196.473, 175.265, 281.09 , ..., 849.397, 685.798, 799.947],\n        [188.093, 169.629, 275.093, ..., 851.105, 699.691, 758.693],\n        ...,\n        [199.97 , 181.119, 276.119, ..., 885.826, 721.481, 780.279],\n        [190.317, 177.701, 271.189, ..., 838.828, 746.589, 728.727],\n        [204.84 , 172.027, 285.739, ..., 834.172, 716.608, 793.402]],\n\n       [[208.713, 177.249, 294.043, ..., 808.862, 703.655, 795.465],\n        [195.615, 175.848, 270.427, ..., 868.339, 708.692, 818.016],\n        [196.822, 173.831, 290.472, ..., 826.401, 681.787, 797.722],\n        ...,\n        [193.524, 174.954, 287.664, ..., 903.736, 662.875, 770.548],\n        [198.162, 176.923, 293.263, ..., 855.101, 686.138, 768.147],\n        [195.309, 174.313, 283.913, ..., 858.254, 697.01 , 809.608]],\n\n       [[206.681, 178.897, 283.43 , ..., 877.47 , 671.592, 773.122],\n        [204.176, 182.882, 273.709, ..., 802.932, 712.307, 766.771],\n        [198.835, 175.814, 295.483, ..., 807.124, 708.253, 755.885],\n        ...,\n        [199.387, 166.869, 282.976, ..., 836.791, 687.756, 831.191],\n        [189.255, 178.416, 288.503, ..., 861.03 , 712.108, 784.301],\n        [207.587, 184.839, 288.502, ..., 839.977, 693.941, 779.685]]])Indexes: (3)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))Attributes: (4)created_at :2024-03-25T17:00:18.423820arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  log_likelihood\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 264kB\nDimensions:     (chain: 4, draw: 1000, llik_dim_0: 8)\nCoordinates:\n  * chain       (chain) int64 32B 0 1 2 3\n  * draw        (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999\n  * llik_dim_0  (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7\nData variables:\n    llik        (chain, draw, llik_dim_0) float64 256kB -3.875 -2.193 ... -3.675\nAttributes:\n    created_at:                 2024-03-25T17:00:18.465085\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000llik_dim_0: 8Coordinates: (3)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])llik_dim_0(llik_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])Data variables: (1)llik(chain, draw, llik_dim_0)float64-3.875 -2.193 ... -4.218 -3.675array([[[-3.87496, -2.19298, -5.05717, ..., -4.10576, -3.82668,\n         -5.84213],\n        [-3.0104 , -2.18047, -2.95307, ..., -3.76203, -4.04436,\n         -5.07012],\n        [-2.61437, -2.19405, -4.63408, ..., -3.87112, -3.55132,\n         -3.70786],\n        ...,\n        [-3.60909, -2.48505, -2.6706 , ..., -3.91375, -4.35139,\n         -4.01956],\n        [-2.51933, -2.83159, -2.7908 , ..., -3.79013, -5.25984,\n         -3.74878],\n        [-2.31632, -4.73471, -3.46818, ..., -5.85992, -5.29236,\n         -3.8341 ]],\n\n       [[-2.38707, -2.3283 , -2.64878, ..., -3.83624, -4.33519,\n         -4.27453],\n        [-2.69467, -2.22771, -2.68636, ..., -3.96266, -3.97258,\n         -4.11793],\n        [-2.32746, -2.27031, -4.41154, ..., -4.53482, -3.55743,\n         -4.17434],\n...\n        [-2.59181, -2.25385, -2.67099, ..., -4.91015, -4.03322,\n         -3.99848],\n        [-2.40251, -2.43096, -4.67866, ..., -4.25938, -4.03786,\n         -5.46025],\n        [-2.30753, -3.00209, -2.83217, ..., -3.75844, -3.58368,\n         -3.74147]],\n\n       [[-2.56303, -2.25859, -3.31605, ..., -3.90128, -4.10547,\n         -3.73708],\n        [-2.37797, -2.74359, -4.55847, ..., -4.38751, -4.17394,\n         -3.70423],\n        [-2.31732, -2.66832, -4.37736, ..., -3.80292, -3.59671,\n         -3.70401],\n        ...,\n        [-2.78946, -2.22266, -2.67603, ..., -3.75926, -3.55935,\n         -7.68485],\n        [-2.63537, -2.2518 , -3.67463, ..., -4.03551, -3.59052,\n         -4.09134],\n        [-3.36342, -2.82215, -3.077  , ..., -3.84831, -4.21812,\n         -3.67479]]])Indexes: (3)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))llik_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))Attributes: (4)created_at :2024-03-25T17:00:18.465085arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  sample_stats\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 204kB\nDimensions:          (chain: 4, draw: 1000)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\nData variables:\n    lp               (chain, draw) float64 32kB 21.01 18.71 ... 24.94 22.19\n    acceptance_rate  (chain, draw) float64 32kB 0.9988 0.7904 ... 0.8739 0.9229\n    step_size        (chain, draw) float64 32kB 0.3503 0.3503 ... 0.4368 0.4368\n    tree_depth       (chain, draw) int64 32kB 3 4 3 3 3 3 3 3 ... 3 3 3 3 3 3 4\n    n_steps          (chain, draw) int64 32kB 7 31 7 15 7 7 7 ... 7 7 7 7 7 7 15\n    diverging        (chain, draw) bool 4kB False False False ... False False\n    energy           (chain, draw) float64 32kB -6.192 -8.972 ... -15.39 -11.6\nAttributes:\n    created_at:                 2024-03-25T17:00:18.422253\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000Coordinates: (2)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])Data variables: (7)lp(chain, draw)float6421.01 18.71 28.67 ... 24.94 22.19array([[21.009 , 18.71  , 28.6744, ..., 24.9497, 21.3791, 18.4294],\n       [27.8086, 32.2878, 26.6081, ..., 26.1025, 28.0347, 28.8471],\n       [24.4864, 25.8458, 31.4533, ..., 31.4475, 30.7953, 30.7299],\n       [30.3395, 21.0069, 20.8335, ..., 24.7186, 24.9444, 22.194 ]])acceptance_rate(chain, draw)float640.9988 0.7904 1.0 ... 0.8739 0.9229array([[0.998827, 0.790351, 1.      , ..., 0.935358, 0.971577, 0.847244],\n       [0.979632, 0.984735, 0.751846, ..., 1.      , 0.901231, 0.83027 ],\n       [0.64411 , 0.901806, 0.983725, ..., 0.809004, 0.692338, 0.828195],\n       [0.931977, 0.618653, 0.940008, ..., 0.973274, 0.873885, 0.922928]])step_size(chain, draw)float640.3503 0.3503 ... 0.4368 0.4368array([[0.350264, 0.350264, 0.350264, ..., 0.350264, 0.350264, 0.350264],\n       [0.343679, 0.343679, 0.343679, ..., 0.343679, 0.343679, 0.343679],\n       [0.326612, 0.326612, 0.326612, ..., 0.326612, 0.326612, 0.326612],\n       [0.43677 , 0.43677 , 0.43677 , ..., 0.43677 , 0.43677 , 0.43677 ]])tree_depth(chain, draw)int643 4 3 3 3 3 3 3 ... 3 3 3 3 3 3 3 4array([[3, 4, 3, ..., 4, 4, 4],\n       [3, 3, 4, ..., 4, 4, 3],\n       [4, 4, 4, ..., 3, 4, 3],\n       [3, 3, 3, ..., 3, 3, 4]])n_steps(chain, draw)int647 31 7 15 7 7 7 ... 7 7 7 7 7 7 15array([[ 7, 31,  7, ..., 15, 15, 15],\n       [ 7,  7, 15, ..., 15, 31,  7],\n       [15, 15, 15, ...,  7, 15, 15],\n       [ 7,  7,  7, ...,  7,  7, 15]])diverging(chain, draw)boolFalse False False ... False Falsearray([[False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False]])energy(chain, draw)float64-6.192 -8.972 ... -15.39 -11.6array([[ -6.19192,  -8.97176, -14.139  , ..., -11.2611 , -12.0468 ,\n         -1.39412],\n       [-13.6371 , -23.0408 , -15.4574 , ...,  -7.31748, -10.7674 ,\n        -13.1799 ],\n       [-14.2463 , -12.2545 , -20.9744 , ..., -15.1772 , -12.6979 ,\n        -18.2411 ],\n       [-17.2553 , -11.9235 ,  -9.46781, ..., -16.5739 , -15.3934 ,\n        -11.6042 ]])Indexes: (2)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))Attributes: (4)created_at :2024-03-25T17:00:18.422253arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  prior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 2MB\nDimensions:          (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24,\n                      obs: 8, llik_dim_0: 8)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\n  * volume_ls_dim_0  (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23\n  * tube             (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23\n  * obs              (obs) int64 64B 1 4 9 11 15 19 21 22\n  * llik_dim_0       (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7\nData variables:\n    volume_noise_s   (chain, draw) float64 32kB 0.02974 0.02728 ... 0.2746\n    bias_factor_l    (chain, draw) float64 32kB -0.1025 -0.07485 ... -0.1921\n    volume_ls        (chain, draw, volume_ls_dim_0) float64 768kB -1.18 ... 0...\n    bias_factor      (chain, draw) float64 32kB 0.9026 0.9279 ... 0.9219 0.8252\n    volume_noise     (chain, draw) float64 32kB 0.0182 0.01669 ... 0.1236 0.168\n    volume           (chain, draw, tube) float64 768kB 193.8 185.0 ... 693.1\n    y_rep            (chain, draw, obs) float64 256kB 184.8 189.1 ... 1.002e+03\n    llik             (chain, draw, llik_dim_0) float64 256kB -8.981 ... -87.32\nAttributes:\n    created_at:                 2024-03-25T17:00:18.459295\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000volume_ls_dim_0: 24tube: 24obs: 8llik_dim_0: 8Coordinates: (6)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])volume_ls_dim_0(volume_ls_dim_0)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])llik_dim_0(llik_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])Data variables: (8)volume_noise_s(chain, draw)float640.02974 0.02728 ... 0.2021 0.2746array([[0.0297445, 0.0272758, 0.0265095, ..., 0.178736 , 0.192878 ,\n        0.143125 ],\n       [0.0217743, 0.0275153, 0.0318496, ..., 0.187067 , 0.213967 ,\n        0.2151   ],\n       [0.14644  , 0.161344 , 0.154124 , ..., 0.222906 , 0.210747 ,\n        0.183849 ],\n       [0.196802 , 0.224444 , 0.332829 , ..., 0.104148 , 0.202053 ,\n        0.274571 ]])bias_factor_l(chain, draw)float64-0.1025 -0.07485 ... -0.1921array([[-0.102507 , -0.074851 , -0.071183 , ..., -0.0135833, -0.072937 ,\n        -0.0888987],\n       [-0.131403 , -0.137815 , -0.131853 , ..., -0.0773899,  0.0578722,\n        -0.0226735],\n       [ 0.134711 ,  0.0641626, -0.0368078, ..., -0.277545 , -0.22355  ,\n        -0.29311  ],\n       [-0.0320055, -0.127452 ,  0.0724585, ..., -0.0552708, -0.0813462,\n        -0.192088 ]])volume_ls(chain, draw, volume_ls_dim_0)float64-1.18 -1.256 ... 1.525 0.9022array([[[-1.18014 , -1.25596 , -1.21539 , ...,  1.06275 ,  1.04836 ,\n          1.03579 ],\n        [-1.24896 , -1.18338 , -1.21905 , ...,  1.03778 ,  1.05845 ,\n          1.05922 ],\n        [-1.18062 , -1.20861 , -1.20187 , ...,  1.06469 ,  1.05634 ,\n          1.01356 ],\n        ...,\n        [-1.06469 , -1.19225 , -1.39826 , ...,  0.878588,  1.39824 ,\n          1.34754 ],\n        [-1.43398 , -1.40853 , -1.21943 , ...,  1.50548 ,  0.928725,\n          0.986833],\n        [-1.19609 , -1.28651 , -1.26602 , ...,  0.972567,  1.07923 ,\n          1.10071 ]],\n\n       [[-1.24692 , -1.24356 , -1.21965 , ...,  1.00025 ,  0.983762,\n          1.02885 ],\n        [-1.28328 , -1.29047 , -1.25435 , ...,  1.07908 ,  0.979001,\n          0.99518 ],\n        [-1.24983 , -1.2417  , -1.28149 , ...,  0.913942,  1.01969 ,\n          1.0017  ],\n...\n        [-1.28413 , -1.1961  , -1.73514 , ...,  1.07113 ,  1.08167 ,\n          0.796746],\n        [-1.53629 , -1.42865 , -1.07195 , ...,  0.73767 ,  0.848461,\n          1.2325  ],\n        [-1.33974 , -1.18125 , -1.43554 , ...,  0.932523,  0.996232,\n          0.864258]],\n\n       [[-1.04915 , -0.949361, -0.929134, ...,  1.15517 ,  0.924563,\n          0.388474],\n        [-1.26699 , -1.06466 , -1.12933 , ...,  0.873924,  0.755863,\n          1.21534 ],\n        [-1.1652  , -1.53275 , -1.46456 , ...,  1.52952 ,  1.5088  ,\n          1.27139 ],\n        ...,\n        [-1.14013 , -1.28078 , -1.14028 , ...,  1.01727 ,  1.10426 ,\n          0.960508],\n        [-1.4791  , -1.09886 , -1.39305 , ...,  0.918405,  1.08038 ,\n          1.22328 ],\n        [-1.2026  , -1.02384 , -1.02024 , ...,  1.01766 ,  1.52503 ,\n          0.902215]]])bias_factor(chain, draw)float640.9026 0.9279 ... 0.9219 0.8252array([[0.902572, 0.927882, 0.931291, ..., 0.986509, 0.929659, 0.914938],\n       [0.876865, 0.87126 , 0.87647 , ..., 0.925529, 1.05958 , 0.977582],\n       [1.14421 , 1.06627 , 0.963861, ..., 0.757641, 0.799675, 0.74594 ],\n       [0.968501, 0.880336, 1.07515 , ..., 0.946229, 0.921874, 0.825234]])volume_noise(chain, draw)float640.0182 0.01669 ... 0.1236 0.168array([[0.0182013, 0.0166906, 0.0162217, ..., 0.109372 , 0.118026 ,\n        0.0875808],\n       [0.0133241, 0.0168372, 0.0194894, ..., 0.11447  , 0.130931 ,\n        0.131624 ],\n       [0.0896095, 0.0987295, 0.0943116, ..., 0.136401 , 0.12896  ,\n        0.112501 ],\n       [0.120427 , 0.137342 , 0.203665 , ..., 0.0637301, 0.12364  ,\n        0.168016 ]])volume(chain, draw, tube)float64193.8 185.0 ... 1.015e+03 693.1array([[[ 193.83 ,  185.043,  189.693, ...,  764.673,  757.97 ,\n          752.164],\n        [ 185.837,  193.446,  189.27 , ...,  753.078,  762.666,\n          763.023],\n        [ 193.772,  190.482,  191.27 , ...,  765.581,  761.682,\n          742.002],\n        ...,\n        [ 208.018,  192.399,  169.611, ...,  683.179,  938.933,\n          910.251],\n        [ 165.943,  168.549,  189.225, ..., 1002.61 ,  704.464,\n          729.963],\n        [ 191.947,  181.615,  183.906, ...,  723.619,  772.422,\n          782.645]],\n\n       [[ 186.069,  186.452,  189.2  , ...,  735.982,  728.593,\n          748.973],\n        [ 181.974,  181.175,  185.225, ...,  772.355,  726.473,\n          733.701],\n        [ 185.738,  186.664,  182.174, ...,  698.12 ,  744.787,\n          736.633],\n...\n        [ 181.879,  191.946,  138.015, ...,  768.607,  773.58 ,\n          649.808],\n        [ 155.873,  166.485,  207.096, ...,  626.737,  670.7  ,\n          848.373],\n        [ 175.795,  193.698,  165.786, ...,  706.103,  734.174,\n          677.215]],\n\n       [[ 210.005,  223.229,  226.009, ...,  809.167,  702.672,\n          506.157],\n        [ 183.797,  208.022,  199.951, ...,  681.232,  633.753,\n          839.513],\n        [ 195.61 ,  156.211,  162.867, ..., 1017.47 , 1004.65 ,\n          868.806],\n        ...,\n        [ 198.634,  182.253,  198.615, ...,  743.688,  784.348,\n          718.299],\n        [ 161.425,  203.714,  170.153, ...,  700.029,  772.966,\n          843.605],\n        [ 191.183,  213.283,  213.753, ...,  743.862, 1014.68 ,\n          693.128]]])y_rep(chain, draw, obs)float64184.8 189.1 ... 741.0 1.002e+03array([[[ 184.813,  189.054,  386.057, ...,  759.48 ,  754.107,\n          752.463],\n        [ 188.057,  194.853,  377.611, ...,  747.653,  741.944,\n          791.905],\n        [ 191.471,  191.019,  380.344, ...,  762.074,  750.234,\n          764.223],\n        ...,\n        [ 188.237,  217.168,  422.702, ...,  804.179,  687.31 ,\n          936.802],\n        [ 167.074,  207.593,  425.475, ...,  681.052, 1021.48 ,\n          705.951],\n        [ 177.321,  174.68 ,  365.093, ...,  756.83 ,  745.255,\n          747.625]],\n\n       [[ 194.119,  186.317,  365.097, ...,  722.354,  724.962,\n          745.609],\n        [ 183.759,  188.886,  368.915, ...,  750.265,  771.006,\n          708.641],\n        [ 188.523,  182.887,  388.006, ...,  759.472,  700.728,\n          735.546],\n...\n        [ 187.093,  217.335,  268.847, ...,  600.192,  760.567,\n          755.355],\n        [ 165.12 ,  131.573,  447.16 , ...,  730.651,  634.832,\n          670.844],\n        [ 194.342,  155.376,  278.762, ...,  557.058,  670.72 ,\n          756.81 ]],\n\n       [[ 227.482,  215.861,  369.136, ...,  876.322,  807.167,\n          691.492],\n        [ 204.28 ,  253.91 ,  468.703, ...,  851.316,  673.234,\n          629.031],\n        [ 158.707,  185.01 ,  421.974, ..., 1219.31 , 1041.13 ,\n          994.95 ],\n        ...,\n        [ 181.02 ,  216.746,  403.83 , ...,  735.791,  757.416,\n          787.206],\n        [ 204.988,  169.965,  321.8  , ...,  875.623,  693.14 ,\n          777.141],\n        [ 215.283,  205.664,  494.475, ...,  677.682,  740.974,\n         1001.79 ]]])llik(chain, draw, llik_dim_0)float64-8.981 -8.16 ... -9.587 -87.32array([[[  -8.98149,   -8.16041, -121.834  , ...,  -26.6511 ,\n          -15.333  ,   -5.03173],\n        [  -3.3298 ,   -6.74602, -120.175  , ...,  -18.3798 ,\n          -11.9159 ,   -4.56985],\n        [  -4.73572,  -11.8407 , -113.21   , ...,  -18.8408 ,\n          -15.6229 ,   -4.65854],\n        ...,\n        [  -3.75606,  -57.2464 , -186.1    , ...,   -4.7875 ,\n           -3.85341,  -44.6675 ],\n        [ -36.9401 ,  -34.6153 , -209.275  , ...,  -58.3687 ,\n         -172.836  ,  -17.7702 ],\n        [ -12.8355 ,   -2.24944,  -85.4265 , ...,  -17.943  ,\n           -5.74474,   -3.91947]],\n\n       [[  -7.66673,   -5.33243,  -94.05   , ...,  -33.5105 ,\n           -7.87838,  -10.2459 ],\n        [ -13.3995 ,   -6.43127,  -83.7495 , ...,  -37.2591 ,\n          -17.8842 ,  -10.7848 ],\n        [  -7.4818 ,   -3.62121,  -98.3658 , ...,  -24.9407 ,\n           -3.5948 ,   -6.86394],\n...\n        [  -3.96411,  -65.3276 ,   -6.23833, ..., -161.946  ,\n          -16.6114 ,   -3.86945],\n        [ -42.2559 ,  -75.7815 , -300.021  , ...,  -44.6827 ,\n          -16.5108 ,  -33.8265 ],\n        [  -3.23865,  -16.1718 ,   -4.7924 , ..., -249.053  ,\n           -3.92796,   -8.93514]],\n\n       [[ -18.6896 ,  -57.0662 , -102.733  , ...,   -4.63053,\n          -33.0595 ,  -18.4545 ],\n        [  -4.71529, -167.057  , -299.696  , ...,   -5.00925,\n           -3.97491,  -59.8419 ],\n        [ -75.8    ,   -3.83045, -201.039  , ..., -167.683  ,\n         -186.641  ,  -81.0174 ],\n        ...,\n        [ -12.046  ,  -44.6544 , -129.648  , ...,  -32.7041 ,\n           -9.54657,   -3.67239],\n        [  -2.96314,   -4.2949 ,  -27.2165 , ...,   -4.11057,\n           -3.64532,   -3.89527],\n        [  -8.2396 ,  -22.3636 , -384.246  , ...,  -74.6057 ,\n           -9.58703,  -87.318  ]]])Indexes: (6)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))volume_ls_dim_0PandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='volume_ls_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))llik_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))Attributes: (4)created_at :2024-03-25T17:00:18.459295arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  sample_stats_prior\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 204kB\nDimensions:          (chain: 4, draw: 1000)\nCoordinates:\n  * chain            (chain) int64 32B 0 1 2 3\n  * draw             (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999\nData variables:\n    lp               (chain, draw) float64 32kB 72.09 71.09 ... 23.28 14.77\n    acceptance_rate  (chain, draw) float64 32kB 0.9856 0.9735 ... 0.9808 0.99\n    step_size        (chain, draw) float64 32kB 0.03423 0.03423 ... 0.04274\n    tree_depth       (chain, draw) int64 32kB 4 4 4 4 4 4 4 4 ... 7 6 7 7 6 6 7\n    n_steps          (chain, draw) int64 32kB 15 31 15 15 15 ... 127 63 63 127\n    diverging        (chain, draw) bool 4kB False False False ... False False\n    energy           (chain, draw) float64 32kB -58.38 -61.21 ... -14.09 -3.941\nAttributes:\n    created_at:                 2024-03-25T17:00:18.461900\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:chain: 4draw: 1000Coordinates: (2)chain(chain)int640 1 2 3array([0, 1, 2, 3])draw(draw)int640 1 2 3 4 5 ... 995 996 997 998 999array([  0,   1,   2, ..., 997, 998, 999])Data variables: (7)lp(chain, draw)float6472.09 71.09 72.54 ... 23.28 14.77array([[72.0913, 71.0934, 72.5368, ..., 22.7741, 19.717 , 31.5704],\n       [78.6388, 67.3879, 66.6947, ..., 27.7493, 23.3811, 21.9996],\n       [32.2751, 29.5342, 31.5931, ..., 21.5172, 20.5746, 25.7496],\n       [19.4916, 19.317 , 10.9252, ..., 36.2183, 23.2776, 14.7731]])acceptance_rate(chain, draw)float640.9856 0.9735 ... 0.9808 0.99array([[0.985585, 0.973493, 0.926252, ..., 0.999961, 0.998114, 0.999975],\n       [0.987531, 0.826221, 0.990511, ..., 0.999879, 0.999778, 0.999777],\n       [0.989942, 0.996745, 0.999915, ..., 0.997515, 0.999232, 0.999938],\n       [0.999604, 0.99648 , 0.998834, ..., 0.999231, 0.980821, 0.99004 ]])step_size(chain, draw)float640.03423 0.03423 ... 0.04274 0.04274array([[0.0342255, 0.0342255, 0.0342255, ..., 0.0342255, 0.0342255,\n        0.0342255],\n       [0.0211665, 0.0211665, 0.0211665, ..., 0.0211665, 0.0211665,\n        0.0211665],\n       [0.0399343, 0.0399343, 0.0399343, ..., 0.0399343, 0.0399343,\n        0.0399343],\n       [0.0427424, 0.0427424, 0.0427424, ..., 0.0427424, 0.0427424,\n        0.0427424]])tree_depth(chain, draw)int644 4 4 4 4 4 4 4 ... 7 7 6 7 7 6 6 7array([[4, 4, 4, ..., 7, 7, 7],\n       [5, 5, 5, ..., 8, 8, 8],\n       [6, 7, 6, ..., 7, 7, 7],\n       [7, 7, 7, ..., 6, 6, 7]])n_steps(chain, draw)int6415 31 15 15 15 ... 127 63 63 127array([[ 15,  31,  15, ..., 127, 255, 127],\n       [ 31,  31,  31, ..., 255, 255, 255],\n       [ 63, 255,  63, ..., 127, 127, 127],\n       [127, 127, 127, ...,  63,  63, 127]])diverging(chain, draw)boolFalse False False ... False Falsearray([[False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False],\n       [False, False, False, ..., False, False, False]])energy(chain, draw)float64-58.38 -61.21 ... -14.09 -3.941array([[-5.83823e+01, -6.12113e+01, -6.32211e+01, ..., -1.00715e+01,\n         3.57843e-02, -1.26730e+01],\n       [-7.05147e+01, -5.28132e+01, -5.87134e+01, ..., -1.31417e+01,\n        -1.17634e+01, -5.75708e+00],\n       [-2.44030e+01, -1.81596e+01, -2.17799e+01, ..., -9.25289e+00,\n        -6.95541e+00, -9.89278e+00],\n       [-5.78982e+00, -5.81860e+00, -6.76507e-01, ..., -1.92249e+01,\n        -1.40883e+01, -3.94102e+00]])Indexes: (2)chainPandasIndexPandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))drawPandasIndexPandasIndex(Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,\n       ...\n       990, 991, 992, 993, 994, 995, 996, 997, 998, 999],\n      dtype='int64', name='draw', length=1000))Attributes: (4)created_at :2024-03-25T17:00:18.461900arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n            \n                  \n                  observed_data\n                  \n                  \n                      \n                          \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<xarray.Dataset> Size: 896B\nDimensions:               (N_dim_0: 1, N_cal_dim_0: 1, tube: 24, obs: 8,\n                           measurement_ix_dim_0: 8, cal_error_dim_0: 1,\n                           likelihood_dim_0: 1)\nCoordinates:\n  * N_dim_0               (N_dim_0) int64 8B 0\n  * N_cal_dim_0           (N_cal_dim_0) int64 8B 0\n  * tube                  (tube) int64 192B 0 1 2 3 4 5 6 ... 18 19 20 21 22 23\n  * obs                   (obs) int64 64B 1 4 9 11 15 19 21 22\n  * measurement_ix_dim_0  (measurement_ix_dim_0) int64 64B 0 1 2 3 4 5 6 7\n  * cal_error_dim_0       (cal_error_dim_0) int64 8B 0\n  * likelihood_dim_0      (likelihood_dim_0) int64 8B 0\nData variables:\n    N                     (N_dim_0) int64 8B 24\n    N_cal                 (N_cal_dim_0) int64 8B 8\n    target_volume         (tube) int64 192B 200 200 200 200 ... 800 800 800 800\n    y                     (obs) float64 64B 199.1 176.3 281.9 ... 693.9 783.4\n    measurement_ix        (measurement_ix_dim_0) int64 64B 2 5 10 12 16 20 22 23\n    cal_error             (cal_error_dim_0) float64 8B 0.02\n    likelihood            (likelihood_dim_0) int64 8B 1\n    true_volume           (tube) float64 192B 152.6 199.7 161.3 ... 770.5 598.6\nAttributes:\n    created_at:                 2024-03-25T17:00:18.463872\n    arviz_version:              0.17.0\n    inference_library:          cmdstanpy\n    inference_library_version:  1.2.1xarray.DatasetDimensions:N_dim_0: 1N_cal_dim_0: 1tube: 24obs: 8measurement_ix_dim_0: 8cal_error_dim_0: 1likelihood_dim_0: 1Coordinates: (7)N_dim_0(N_dim_0)int640array([0])N_cal_dim_0(N_cal_dim_0)int640array([0])tube(tube)int640 1 2 3 4 5 6 ... 18 19 20 21 22 23array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23])obs(obs)int641 4 9 11 15 19 21 22array([ 1,  4,  9, 11, 15, 19, 21, 22])measurement_ix_dim_0(measurement_ix_dim_0)int640 1 2 3 4 5 6 7array([0, 1, 2, 3, 4, 5, 6, 7])cal_error_dim_0(cal_error_dim_0)int640array([0])likelihood_dim_0(likelihood_dim_0)int640array([0])Data variables: (8)N(N_dim_0)int6424array([24])N_cal(N_cal_dim_0)int648array([8])target_volume(tube)int64200 200 200 200 ... 800 800 800 800array([200, 200, 200, 200, 200, 200, 200, 200, 400, 400, 400, 400, 400,\n       400, 400, 400, 800, 800, 800, 800, 800, 800, 800, 800])y(obs)float64199.1 176.3 281.9 ... 693.9 783.4array([199.07727295, 176.25632772, 281.87757567, 387.16351159,\n       362.14946827, 853.23878527, 693.91934176, 783.39963414])measurement_ix(measurement_ix_dim_0)int642 5 10 12 16 20 22 23array([ 2,  5, 10, 12, 16, 20, 22, 23])cal_error(cal_error_dim_0)float640.02array([0.02])likelihood(likelihood_dim_0)int641array([1])true_volume(tube)float64152.6 199.7 161.3 ... 770.5 598.6array([152.64294349, 199.70810807, 161.32449302, 171.49715397,\n       174.67894069, 163.43175945, 153.50063823, 187.79919405,\n       364.94147086, 289.55488638, 445.13256694, 387.79655766,\n       326.2592979 , 385.23402356, 335.94110379, 349.87019831,\n       761.78380169, 620.86368119, 745.73037525, 809.71007835,\n       803.52489045, 683.21425317, 770.52360505, 598.61585552])Indexes: (7)N_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='N_dim_0'))N_cal_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='N_cal_dim_0'))tubePandasIndexPandasIndex(Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,\n       18, 19, 20, 21, 22, 23],\n      dtype='int64', name='tube'))obsPandasIndexPandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))measurement_ix_dim_0PandasIndexPandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='measurement_ix_dim_0'))cal_error_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='cal_error_dim_0'))likelihood_dim_0PandasIndexPandasIndex(Index([0], dtype='int64', name='likelihood_dim_0'))Attributes: (4)created_at :2024-03-25T17:00:18.463872arviz_version :0.17.0inference_library :cmdstanpyinference_library_version :1.2.1\n                      \n                  \n            \n            \n              \n            \n            \n\n\nNext we look at the summaries of both the posterior and prior.\n\nfor group_name in [\"prior\", \"posterior\"]:\n    group = idata.get(group_name)\n    group_summary = az.summary(\n        group,\n        var_names=[\n            \"volume_noise\", \"bias_factor\", \"volume_noise_s\", \"bias_factor_l\"\n        ]\n    )\n    display(group_summary)\n\n\n\n\n\n\n\n\n\nmean\nsd\nhdi_3%\nhdi_97%\nmcse_mean\nmcse_sd\ness_bulk\ness_tail\nr_hat\n\n\n\n\nvolume_noise\n0.066\n0.034\n0.016\n0.128\n0.002\n0.001\n224.0\n380.0\n1.01\n\n\nbias_factor\n0.994\n0.143\n0.731\n1.252\n0.015\n0.010\n80.0\n120.0\n1.06\n\n\nvolume_noise_s\n0.108\n0.055\n0.026\n0.210\n0.003\n0.002\n224.0\n380.0\n1.01\n\n\nbias_factor_l\n-0.016\n0.141\n-0.286\n0.242\n0.015\n0.011\n80.0\n120.0\n1.06\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nmean\nsd\nhdi_3%\nhdi_97%\nmcse_mean\nmcse_sd\ness_bulk\ness_tail\nr_hat\n\n\n\n\nvolume_noise\n0.113\n0.026\n0.068\n0.160\n0.001\n0.001\n1303.0\n1943.0\n1.0\n\n\nbias_factor\n0.887\n0.055\n0.781\n0.986\n0.001\n0.001\n2493.0\n2380.0\n1.0\n\n\nvolume_noise_s\n0.185\n0.043\n0.111\n0.261\n0.001\n0.001\n1303.0\n1943.0\n1.0\n\n\nbias_factor_l\n-0.122\n0.061\n-0.229\n0.002\n0.001\n0.001\n2493.0\n2380.0\n1.0\n\n\n\n\n\n\n\n\n\n\nInvestigating the results\nThis plot compares the measurements with the observations.\n\naz.plot_lm(\n    y=idata.observed_data[\"y\"],\n    x=idata.observed_data[\"tube\"].sel(tube=MEASUREMENT_IX + 1),\n    y_hat=idata.posterior_predictive[\"y_rep\"]\n)\nax = plt.gca()\nax.semilogy()\n\n\n\n\n\n\n\n\nThis plot compares the volume_noise and bias_factor samples with the true values that we used to simulate the data.\n\naz.plot_posterior(\n  idata.prior,\n  var_names=[\"volume_noise\", \"bias_factor\"],\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  figsize=[12, 4]\n)\nf = plt.gcf()\naxes = f.axes\naz.plot_posterior(\n  idata.posterior,\n  var_names=[\"volume_noise\", \"bias_factor\"],\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  figsize=[12, 4],\n  ax=axes,\n  color=\"tab:orange\"\n)\nfor ax, truth in zip(f.axes, [NOISE, BIAS_FACTOR]):\n    ax.axvline(truth, color=\"red\")\n\n\n\n\n\n\n\n\nThis plot shows the samples for all the tubes’ volumes, including those that weren’t measured, alongside the true volumes.\n\naz.plot_lm(\n    x=idata.observed_data[\"tube\"],\n    y=idata.observed_data[\"true_volume\"],\n    y_hat=idata.posterior[\"volume\"],\n    grid=False,\n    y_kwargs={\"label\": \"true volume\"},\n    figsize=[10, 5],\n    legend=False,\n)\nax = plt.gca()\nfor i in MEASUREMENT_IX:\n    ax.text(i+0.1, volumes[i], \"obs\", zorder=1000)\nax.set(xlabel=\"tube\", ylabel=\"volume ($\\\\mu$l)\");\nax.semilogy()\n\n\n\n\n\n\n\n\nSo, what is the probability that Teddy put less than 350 \\(\\mu\\)l of label into tube 10, even though the target amount was 400\\(\\mu\\)l?\n\naz.plot_posterior(\n  idata.prior,\n  var_names=[\"volume\"],\n  coords={\"tube\": [10]},\n  kind=\"hist\",\n  hdi_prob=\"hide\",\n  point_estimate=None,\n  ref_val=350,\n  figsize=[12, 4],\n  bins=np.linspace(250, 600, 30),\n)\n\n\n\n\n\n\n\n\nPhew, only about 13%, that’s probably fine right?",
     "crumbs": [
       "Course materials",
       "Regression models for describing measurements"
@@ -92,7 +92,7 @@
     "href": "metropolis-hastings.html#coding-the-metropolis-hastings-in-practice",
     "title": "Metropolis Hastings",
     "section": "Coding the Metropolis Hastings in practice",
-    "text": "Coding the Metropolis Hastings in practice\n\nPart 1: sampling from a normal distribution\nGiven a \\(p(x) = N(2, 0.5)\\) how would draw samples using the Metropolis-Hastings algorithm?\n\nChoose proposal value\nEvaluate probability ratio\nAccept-Reject\nincrement step\n\n\nDefine probability density function\n\nimport numpy as np\nfrom scipy.stats import norm\n\ndef prob(x):\n  return norm.pdf(x, 2, 0.5)\n\n\n\nDefine proposal distribution\n\ndef proposal(x):\n  return norm.rvs(x, 1, 1)[0]\n\n\n\nInitialise sampler\n\ncurrent = 0.0\nsamples = [current]\n\n\n\nSample from distribution\n\nfor i in range(10000):\n    prop = proposal(current)\n    accept_reject = prob(prop)/prob(current)\n    if accept_reject > 1:\n        samples.append(prop)\n        current = prop\n    else:\n        cutoff = np.random.rand(1)[0]\n        if accept_reject > cutoff:\n            samples.append(prop)\n            current = prop\n\n\n\nPlot distribution\n\nimport matplotlib.pyplot as plt\nplt.hist(samples)\n\n(array([   2.,   19.,  208.,  724., 1196., 1335., 1005.,  376.,   73.,\n          10.]),\n array([0.        , 0.38251406, 0.76502812, 1.14754218, 1.53005625,\n        1.91257031, 2.29508437, 2.67759843, 3.06011249, 3.44262655,\n        3.82514062]),\n <BarContainer object of 10 artists>)\n\n\n\n\n\n\n\n\n\n\n\nTrace plot\n\ndraw = [draw for draw, _ in enumerate(samples)]\nplt.plot(draw, samples)\n\n\n\n\n\n\n\n\n\n\n\nPart 2: determining mean and standard deviation from data\nI suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.\n\ndef eval_prob(data, mu, sigma):\n    return np.log(norm.pdf(data,mu,sigma)).sum()\n\nHere’s also a multivariate random number generator to generate proposals.\n\ndef proposal_multi(mu, sigma):\n    mean = [mu, sigma]\n    cov = [[0.2, 0], [0, 0.2]]  # diagonal covariance\n    return np.random.multivariate_normal(mean, cov, 1)[0]\n\nHere is how you’d call the proposal function\nprop_mu, prop_sigma = proposal_multi(current_mu, current_sigma)\nthe accept_reject probability should also be updated to account for the log-probability\naccept_reject = np.exp(prob_prop - prob_current)\nYou should sample a 95% interval including a \\(\\mu = 5\\) and a \\(\\sigma = 0.2\\). This may be difficult at first to sample and I would recommend initialising at these values.",
+    "text": "Coding the Metropolis Hastings in practice\n\nPart 1: sampling from a normal distribution\nGiven a \\(p(x) = N(2, 0.5)\\) how would draw samples using the Metropolis-Hastings algorithm?\n\nChoose proposal value\nEvaluate probability ratio\nAccept-Reject\nincrement step\n\n\nDefine probability density function\n\nimport numpy as np\nfrom scipy.stats import norm\n\ndef prob(x):\n  return norm.pdf(x, 2, 0.5)\n\n\n\nDefine proposal distribution\n\ndef proposal(x):\n  return norm.rvs(x, 1, 1)[0]\n\n\n\nInitialise sampler\n\ncurrent = 0.0\nsamples = [current]\n\n\n\nSample from distribution\n\nfor i in range(10000):\n    prop = proposal(current)\n    accept_reject = prob(prop)/prob(current)\n    if accept_reject > 1:\n        samples.append(prop)\n        current = prop\n    else:\n        cutoff = np.random.rand(1)[0]\n        if accept_reject > cutoff:\n            samples.append(prop)\n            current = prop\n\n\n\nPlot distribution\n\nimport matplotlib.pyplot as plt\nplt.hist(samples)\n\n(array([   5.,   52.,  291.,  876., 1399., 1355.,  730.,  215.,   31.,\n           6.]),\n array([0.        , 0.40351717, 0.80703434, 1.21055151, 1.61406869,\n        2.01758586, 2.42110303, 2.8246202 , 3.22813737, 3.63165454,\n        4.03517172]),\n <BarContainer object of 10 artists>)\n\n\n\n\n\n\n\n\n\n\n\nTrace plot\n\ndraw = [draw for draw, _ in enumerate(samples)]\nplt.plot(draw, samples)\n\n\n\n\n\n\n\n\n\n\n\nPart 2: determining mean and standard deviation from data\nI suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.\n\ndef eval_prob(data, mu, sigma):\n    return np.log(norm.pdf(data,mu,sigma)).sum()\n\nHere’s also a multivariate random number generator to generate proposals.\n\ndef proposal_multi(mu, sigma):\n    mean = [mu, sigma]\n    cov = [[0.2, 0], [0, 0.2]]  # diagonal covariance\n    return np.random.multivariate_normal(mean, cov, 1)[0]\n\nHere is how you’d call the proposal function\nprop_mu, prop_sigma = proposal_multi(current_mu, current_sigma)\nthe accept_reject probability should also be updated to account for the log-probability\naccept_reject = np.exp(prob_prop - prob_current)\nYou should sample a 95% interval including a \\(\\mu = 5\\) and a \\(\\sigma = 0.2\\). This may be difficult at first to sample and I would recommend initialising at these values.",
     "crumbs": [
       "Course materials",
       "Metropolis Hastings"
diff --git a/docs/week3.html b/docs/week3.html
index fc4b784..72fab06 100644
--- a/docs/week3.html
+++ b/docs/week3.html
@@ -5153,7 +5153,7 @@ 

Example

We’ll go through some diagnostics using arviz.

Step one is to load some data. Rather than going through a whole modelling workflow, we’ll just take one of the example MCMC outputs that arviz provides via the function load_arviz_data.

This particular MCMC output has to do with measurements of soil radioactivity in the USA. You can read more about it here.

-
+
import arviz as az
 import numpy as np  # numpy is for numerical programming
 import xarray as xr  # xarray is for manipulating n dimensional tables; arviz uses it a lot! 
@@ -5169,8 +5169,8 @@ 

Example

  • - - + +
      @@ -5497,7 +5497,7 @@

      Example

      inference_library: pymc3 inference_library_version: 3.9.2 sampling_time: 18.096983432769775 - tuning_steps: 1000
  • created_at :
    2020-07-24T18:15:12.191355
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2
    sampling_time :
    18.096983432769775
    tuning_steps :
    1000

  • - - + +
      @@ -5854,20 +5854,20 @@

      Example

      created_at: 2020-07-24T18:15:12.449843 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.449843
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -6185,20 +6185,20 @@

      Example

      created_at: 2020-07-24T18:15:12.448264 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.448264
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -6526,17 +6526,17 @@

      Example

      inference_library: pymc3 inference_library_version: 3.9.2 sampling_time: 18.096983432769775 - tuning_steps: 1000
  • created_at :
    2020-07-24T18:15:12.197697
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2
    sampling_time :
    18.096983432769775
    tuning_steps :
    1000

  • - - + +
      @@ -6863,7 +6863,7 @@

      Example

      created_at: 2020-07-24T18:15:12.454586 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
    • g_coef
      PandasIndex
      PandasIndex(Index(['intercept', 'slope'], dtype='object', name='g_coef'))
  • created_at :
    2020-07-24T18:15:12.454586
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7220,20 +7220,20 @@

      Example

      created_at: 2020-07-24T18:15:12.457652 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.457652
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7549,17 +7549,17 @@

      Example

      created_at: 2020-07-24T18:15:12.458415 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.458415
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7878,7 +7878,7 @@

      Example

      created_at: 2020-07-24T18:15:12.459832 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.459832
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • @@ -8203,7 +8203,7 @@

    Example

    For example, if you click on the dropdown arrows above you will see that the group posterior contains a variable called a_county that has three dimensions called chain, draw and County. There are 85 counties and the first one is labelled 'AITKIN'.

    The function az.summary lets us look at some useful summary statistics, including \(\hat{R}\), divergent transitions and MCSE.

    The variable lp, which you can find in the group sample_stats is the model’s total log probability density. It’s not very meaningful on its own, but is useful for judging overall convergence. diverging counts the number of divergent transitions.

    -
    +
    az.summary(idata.sample_stats, var_names=["lp", "diverging"])
    /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/.venv/lib/python3.12/site-packages/arviz/stats/diagnostics.py:592: RuntimeWarning: invalid value encountered in scalar divide
    @@ -8263,7 +8263,7 @@ 

    Example

    In this case there were no post-warmup diverging transitions, and the \(\hat{R}\) statistic for the lp variable is pretty close to 1: great!

    Sometimes it’s useful to summarise individual parameters. This can be done by pointing az.summary at the group where the parameters of interest live. In this case the group is called posterior.

    -
    +
    az.summary(idata.posterior, var_names=["sigma", "g"])
    @@ -8330,7 +8330,7 @@

    Example

    Now we can start evaluating the model. First we check to see whether replicated measurements from the model’s posterior predictive distribution broadly agree with the observed measurements, using the arviz function plot_lm:

    -
    +
    az.style.use("arviz-doc")
     az.plot_lm(
         y=idata.observed_data["y"],
    @@ -8342,14 +8342,14 @@ 

    Example

    -

    +

    The function az.loo can quickly estimate a model’s out of sample log likelihood (which we saw above is a nice default loss function), allowing a nice numerical comparison between models.

    Watch out for the warning column, which can tell you if the estimation is likely to be incorrect. It’s usually a good idea to set the pointwise argument to True, as this allows for more detailed analysis at the per-observation level.

    -
    +
    az.loo(idata, var_name="y", pointwise=True)
    Computed from 2000 posterior samples and 919 observations log-likelihood matrix.
    @@ -8368,7 +8368,7 @@ 

    Example

    The function az.compare is useful for comparing different out of sample log likelihood estimates.

    -
    +
    idata.log_likelihood["fake"] = xr.DataArray(
         # generate some fake log likelihoods
         np.random.normal(0, 2, [4, 500, 919]),
    @@ -8417,21 +8417,21 @@ 

    Example

    -1027.176982 26.817161 0.000000 -0.985153 +0.984918 28.848998 -0.00000 +0.000000 False log fake 1 --1799.914726 -3629.136791 -772.737745 -0.014847 -3.405037 -28.87809 +-1802.060019 +3628.350686 +774.883037 +0.015082 +3.422640 +29.020661 True log @@ -8443,7 +8443,7 @@

    Example

    The function az.plot_compare shows these results on a nice graph:

    -
    +
    az.plot_compare(comparison);
    /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/.venv/lib/python3.12/site-packages/arviz/plots/backends/matplotlib/compareplot.py:87: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    @@ -8452,7 +8452,7 @@ 

    Example

    -

    +

    diff --git a/docs/week5.html b/docs/week5.html index 603f4f9..57acdfc 100644 --- a/docs/week5.html +++ b/docs/week5.html @@ -5008,6 +5008,9 @@

    On this page

    @@ -5093,7 +5096,7 @@

    Regression model

    Simulating fake data

    First some imports: as usual for this course we’ll be using arviz, matplotlib, cmdstanpy, pandas and numpy. stanio is a handy utility library for Stan: it is a dependency of cmdstanpy so you shouldn’t need to install it explicitly.

    -
    +
    import arviz as az
     import matplotlib.pyplot as plt
     import numpy as np
    @@ -5103,7 +5106,7 @@ 

    Simulating fake data< from cmdstanpy import CmdStanModel

    Now some hardcoded numbers, including true values for the parameters here: \(bias\) is 0.88 and \(noise\) is 0.1. Note that \(cal\ error\) is much smaller than \(bias\).

    -
    +
    N = 24
     N_CAL = 8
     TARGET_VOLUMES = np.array(
    @@ -5120,7 +5123,7 @@ 

    Simulating fake data< RNG_SEED = 12345

    Simulate the true volumes

    -
    +
    rng = np.random.default_rng(seed=RNG_SEED)
     ln_mean = [
       np.log(target * BIAS_FACTOR)
    @@ -5138,7 +5141,7 @@ 

    Simulating fake data<

    Plot the volumes and the targets.

    -
    +
    f, ax = plt.subplots()
     bins = np.logspace(np.log10(100), np.log10(1000), 30)
     ax.hist(volumes, bins=bins)
    @@ -5160,7 +5163,7 @@ 

    Simulating fake data<

    Simulate measurements for tubes in the MEASUREMENT_IX.

    -
    +
    measurements = [
       rng.lognormal(np.log(vol), CAL_ERROR)
       for vol in volumes[MEASUREMENT_IX]
    @@ -5240,15 +5243,18 @@ 

    Simulating fake data<

    +
    +
    +

    Writing the model in Stan and sampling the simulated data

    I wrote up the implied statistical model in a Stan file at src/stan/ pipette.stan. This code loads this Stan file as a CmdStanModel object, checks its formatting and prints it out.

    Note that the model internally puts the data on log scale and then standardises it: this is a bit annoying but makes it way easier to set priors and can ultimately save you a lot of trouble.

    -
    +
    model = CmdStanModel(stan_file="../src/stan/pipette.stan")
     model.format(overwrite_file=True, canonicalize=True)
     print(model.code())
    -
    17:52:36 - cmdstanpy - INFO - compiling stan file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette.stan to exe file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette
    -17:52:41 - cmdstanpy - INFO - compiled model executable: /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette
    +
    18:00:12 - cmdstanpy - INFO - compiling stan file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette.stan to exe file /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette
    +18:00:17 - cmdstanpy - INFO - compiled model executable: /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/src/stan/pipette
    functions {
    @@ -5312,7 +5318,7 @@ 

    Simulating fake data<

    This code loads some data into a dictionary that is compatible with Stan and carries out two MCMC runs, one in prior mode and one in posterior mode.

    -
    +
    stan_input_posterior = stanio.json.process_dictionary(
         {
           "N": N,
    @@ -5335,11 +5341,11 @@ 

    Simulating fake data< mcmc_prior.diagnose() mcmc_posterior.diagnose()

    -
    17:52:41 - cmdstanpy - INFO - CmdStan start processing
    -17:52:42 - cmdstanpy - INFO - CmdStan done processing.
    -17:52:42 - cmdstanpy - INFO - CmdStan start processing
    -17:52:42 - cmdstanpy - INFO - CmdStan done processing.
    -17:52:42 - cmdstanpy - WARNING - Non-fatal error during sampling:
    +
    18:00:17 - cmdstanpy - INFO - CmdStan start processing
    +18:00:17 - cmdstanpy - INFO - CmdStan done processing.
    +18:00:17 - cmdstanpy - INFO - CmdStan start processing
    +18:00:18 - cmdstanpy - INFO - CmdStan done processing.
    +18:00:18 - cmdstanpy - WARNING - Non-fatal error during sampling:
     Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
     Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
     Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
    @@ -5348,22 +5354,22 @@ 

    Simulating fake data<

    @@ -5372,31 +5378,34 @@

    Simulating fake data<

    -
    'Processing csv files: /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_1.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_2.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_3.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpslkslp06/pipetteh6tolm1h/pipette-20240325175242_4.csv\n\nChecking sampler transitions treedepth.\nTreedepth satisfactory for all transitions.\n\nChecking sampler transitions for divergences.\nNo divergent transitions found.\n\nChecking E-BFMI - sampler transitions HMC potential energy.\nE-BFMI satisfactory.\n\nEffective sample size satisfactory.\n\nSplit R-hat values satisfactory all parameters.\n\nProcessing complete, no problems detected.\n'
    +
    'Processing csv files: /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_1.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_2.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_3.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpgm4waph_/pipette6vwzjauu/pipette-20240325180017_4.csv\n\nChecking sampler transitions treedepth.\nTreedepth satisfactory for all transitions.\n\nChecking sampler transitions for divergences.\nNo divergent transitions found.\n\nChecking E-BFMI - sampler transitions HMC potential energy.\nE-BFMI satisfactory.\n\nEffective sample size satisfactory.\n\nSplit R-hat values satisfactory all parameters.\n\nProcessing complete, no problems detected.\n'

    The diagnostics seem ok, though interestingly the prior was pretty tricky to sample accurately.

    +
    +
    +

    Loading the MCMC results with arviz

    This code loads both MCMC runs into an arviz InferenceData object.

    -
    +
    coords = {"obs": MEASUREMENT_IX, "tube": range(N)}
     dims={
         "y": ["obs"],
    @@ -5427,8 +5436,8 @@ 

    Simulating fake data<
    • - - + +
        @@ -5749,22 +5758,22 @@

        Simulating fake data< volume_noise (chain, draw) float64 32kB 0.09116 0.1455 ... 0.1199 volume (chain, draw, tube) float64 768kB 182.9 192.1 ... 855.2 Attributes: - created_at: 2024-03-25T16:52:42.632627 + created_at: 2024-03-25T17:00:18.417851 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1

  • created_at :
    2024-03-25T17:00:18.417851
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -6173,10 +6182,10 @@

      Simulating fake data< Data variables: y_rep (chain, draw, obs) float64 256kB 197.1 177.7 290.6 ... 693.9 779.7 Attributes: - created_at: 2024-03-25T16:52:42.638351 + created_at: 2024-03-25T17:00:18.423820 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1
    • obs
      PandasIndex
      PandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))
  • created_at :
    2024-03-25T17:00:18.423820
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -6531,10 +6540,10 @@

      Simulating fake data< Data variables: llik (chain, draw, llik_dim_0) float64 256kB -3.875 -2.193 ... -3.675 Attributes: - created_at: 2024-03-25T16:52:42.680048 + created_at: 2024-03-25T17:00:18.465085 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1
    • llik_dim_0
      PandasIndex
      PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))
  • created_at :
    2024-03-25T17:00:18.465085
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -6904,45 +6913,45 @@

      Simulating fake data< diverging (chain, draw) bool 4kB False False False ... False False energy (chain, draw) float64 32kB -6.192 -8.972 ... -15.39 -11.6 Attributes: - created_at: 2024-03-25T16:52:42.636778 + created_at: 2024-03-25T17:00:18.422253 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1

  • created_at :
    2024-03-25T17:00:18.422253
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -7268,26 +7277,26 @@

      Simulating fake data< y_rep (chain, draw, obs) float64 256kB 184.8 189.1 ... 1.002e+03 llik (chain, draw, llik_dim_0) float64 256kB -8.981 ... -87.32 Attributes: - created_at: 2024-03-25T16:52:42.674317 + created_at: 2024-03-25T17:00:18.459295 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1
    • obs
      PandasIndex
      PandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))
    • llik_dim_0
      PandasIndex
      PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='llik_dim_0'))
  • created_at :
    2024-03-25T17:00:18.459295
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -7791,49 +7800,49 @@

      Simulating fake data< diverging (chain, draw) bool 4kB False False False ... False False energy (chain, draw) float64 32kB -58.38 -61.21 ... -14.09 -3.941 Attributes: - created_at: 2024-03-25T16:52:42.676922 + created_at: 2024-03-25T17:00:18.461900 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1

  • created_at :
    2024-03-25T17:00:18.461900
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • - - + +
      @@ -8161,20 +8170,20 @@

      Simulating fake data< likelihood (likelihood_dim_0) int64 8B 1 true_volume (tube) float64 192B 152.6 199.7 161.3 ... 770.5 598.6 Attributes: - created_at: 2024-03-25T16:52:42.678952 + created_at: 2024-03-25T17:00:18.463872 arviz_version: 0.17.0 inference_library: cmdstanpy - inference_library_version: 1.2.1
    • obs
      PandasIndex
      PandasIndex(Index([1, 4, 9, 11, 15, 19, 21, 22], dtype='int64', name='obs'))
    • measurement_ix_dim_0
      PandasIndex
      PandasIndex(Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64', name='measurement_ix_dim_0'))
    • cal_error_dim_0
      PandasIndex
      PandasIndex(Index([0], dtype='int64', name='cal_error_dim_0'))
    • likelihood_dim_0
      PandasIndex
      PandasIndex(Index([0], dtype='int64', name='likelihood_dim_0'))
  • created_at :
    2024-03-25T17:00:18.463872
    arviz_version :
    0.17.0
    inference_library :
    cmdstanpy
    inference_library_version :
    1.2.1

  • @@ -8463,7 +8472,7 @@

    Simulating fake data<

    Next we look at the summaries of both the posterior and prior.

    -
    +
    for group_name in ["prior", "posterior"]:
         group = idata.get(group_name)
         group_summary = az.summary(
    @@ -8624,8 +8633,11 @@ 

    Simulating fake data<

    + +
    +

    Investigating the results

    This plot compares the measurements with the observations.

    -
    +
    az.plot_lm(
         y=idata.observed_data["y"],
         x=idata.observed_data["tube"].sel(tube=MEASUREMENT_IX + 1),
    @@ -8636,13 +8648,13 @@ 

    Simulating fake data<
    -

    +

    This plot compares the volume_noise and bias_factor samples with the true values that we used to simulate the data.

    -
    +
    az.plot_posterior(
       idata.prior,
       var_names=["volume_noise", "bias_factor"],
    @@ -8674,7 +8686,7 @@ 

    Simulating fake data<

    This plot shows the samples for all the tubes’ volumes, including those that weren’t measured, alongside the true volumes.

    -
    +
    az.plot_lm(
         x=idata.observed_data["tube"],
         y=idata.observed_data["true_volume"],
    @@ -8692,13 +8704,13 @@ 

    Simulating fake data<
    -

    +

    So, what is the probability that Teddy put less than 350 \(\mu\)l of label into tube 10, even though the target amount was 400\(\mu\)l?

    -
    +
    az.plot_posterior(
       idata.prior,
       var_names=["volume"],
    @@ -8732,7 +8744,7 @@ 

    Simulating fake data<