document the use of custom response functions (#71)

ComputationalPsychiatry · Jul 19, 2023 · 4db62bd · 4db62bd
1 parent f00d963
commit 4db62bd
Show file tree

Hide file tree

Showing 18 changed files with 1,888 additions and 745 deletions.
diff --git a/docs/source/images/response_models.png b/docs/source/images/response_models.png
diff --git a/docs/source/notebooks/0-Creating_networks.ipynb b/docs/source/notebooks/0-Creating_networks.ipynb
@@ -618,7 +618,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.9.16"
   }
  },
  "nbformat": 4,

diff --git a/docs/source/notebooks/0-Creating_networks.md b/docs/source/notebooks/0-Creating_networks.md
@@ -5,7 +5,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.14.5
+    jupytext_version: 1.14.7
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python

diff --git a/docs/source/notebooks/0-Theory.ipynb b/docs/source/notebooks/0-Theory.ipynb
diff --git a/docs/source/notebooks/0-Theory.md b/docs/source/notebooks/0-Theory.md
@@ -5,24 +5,22 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.14.5
+    jupytext_version: 1.14.7
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
   name: python3
 ---
 
-+++ {"user_expressions": []}
-
 (theory)=
 # An introduction to the Hierarchical Gaussian Filter
-In this notebook, we introduce the main oncepts on which the Hierarchical Gaussian Filter (HGF) is based. We describe the main equations and illustrate the examples with Python code. We start with the generative model of the HGF, which describes how the model assume that the data is being generated. This generative structure is then used to filter the observation (i.e. the sensory part of the model), which is then used by the agent to produce behaviors (i.e. the action part of the model). Next, we show how this model can be "inverted" and used by an agent to infer parameter values that generated the sensory inputs. From there, we discuss the notion of prediction error and how derivations of the model can be used to infer probability densities given observed behavioural outcomes.
+In this notebook, we introduce the main concepts on which the Hierarchical Gaussian Filter (HGF) is based. We describe the main equations and illustrate the examples with Python code. We start with the generative model of the HGF, which describes how the model assumes that the data is being generated. This generative structure is then used to filter the observation (i.e. the sensory part of the model), which is then used by the agent to produce behaviours (i.e. the action part of the model). Next, we show how this model can be "inverted" and used by an agent to infer parameter values that generated the sensory inputs. From there, we discuss the notion of prediction error and how derivations of the model can be used to infer probability densities given observed behavioural outcomes.
 
-+++ {"user_expressions": []}
++++
 
 ## The generative model
 
-To illustrate the generative model on which the HGF is based, we will start with a simple  two-level continuous HGF (see also the tutorial {ref}`continuous_hgf`). The generative model that underpine the continuous HGF is a generalisation of the [Gaussian Random Walk](https://en.wikipedia.org/wiki/Random_walk#Gaussian_random_walk) (GRW). A GRW generate a new observation $x_1^{(k)}$ at each time step $k$ from a normal distribution and using the previous observation $x_1^{(k-1)}$ such as:
+To illustrate the generative model on which the HGF is based, we will start with a simple  two-level continuous HGF (see also the tutorial {ref}`continuous_hgf`). The generative model that underpins the continuous HGF is a generalisation of the [Gaussian Random Walk](https://en.wikipedia.org/wiki/Random_walk#Gaussian_random_walk) (GRW). A GRW generate a new observation $x_1^{(k)}$ at each time step $k$ from a normal distribution and using the previous observation $x_1^{(k-1)}$ such as:
 
 $$
 x_1^{(k)} \sim \mathcal{N}(x_1^{(k-1)}, \sigma^2)
@@ -50,18 +48,16 @@ plt.ylabel("$x_{1}$");
 sns.despine()
 ```
 
-+++ {"user_expressions": []}
-
 This simple process will be our first building block. Importantly here, the variability of the sensory input is constant across time: even if we don't know exactly in which direction the time series is going to move in the future, we know that is is unlikely to make certain kind of big jumps, because it is controlled by a fixed parameter, the variance $\sigma^2$. 
 
 ### Volatility coupling
 Now, we can also decide to change that and let the variance itself being a stochastic process generated by another randome walk. The HGF fundamentaly captitalize on this notion and generalize the standard GRW by letting the variance $\sigma^2$ being controlled by a higher level node.
 
-+++ {"user_expressions": []}
++++ {"editable": true, "slideshow": {"slide_type": ""}}
 
 If we take as example the two-level continuous HGF {cite:p}`2014:mathys`, the model is constituded of two states of interest, $x_1$ and $x_2$. $x_1$ is performing a GRW as previously defined, but it is also paired with $x_2$ to each other via *volatility coupling*. This means that for state $x_1$, the mean of the Gaussian random walk on trial $k$ is given by its previous value $x_1^{(k-1)}$, while the step size (or variance) depends on the current value of the higher level state, $x_2^{(k)}$.
 
-+++ {"user_expressions": []}
++++ {"editable": true, "slideshow": {"slide_type": ""}}
 
 $$
 x_1^{(k)} \sim \mathcal{N}(x_1^{(k-1)}, \, f(x_2^{(k)}))
@@ -118,8 +114,6 @@ axs[1].set_ylabel("$x_{1}$");
 sns.despine()
 ```
 
-+++ {"user_expressions": []}
-
 In this example, it becomes apparent that the volatility of the observation is not constant in time anymore, but depends on the values observed at the level above. 
 
 ### Value coupling
@@ -165,8 +159,6 @@ axs[0].set_ylabel("$x_{2}$");
 sns.despine()
 ```
 
-+++ {"user_expressions": []}
-
 Finally, volatility and value coupling can operate at the same time on the same node, like in this example where $x_{1}$ has its values coupled with $x_{2}$ and its volatility coupled with $x_{3}$:
 
 $$
@@ -176,8 +168,12 @@ x_3^{(k)}          \sim \mathcal{N}(x_3^{(k)} | x_3^{(k-1)}, \, \exp(\omega_3))
 $$
 
 ```{code-cell} ipython3
-:tags: [hide-input]
-
+---
+editable: true
+slideshow:
+  slide_type: ''
+tags: [hide-input]
+---
 np.random.seed(123)
 alpha_1 = 1.0
 kappa_1 = 1.0
@@ -218,9 +214,7 @@ axs[0].legend()
 sns.despine()
 ```
 
-+++ {"user_expressions": []}
-
-Based on these principles, any given state in the world can be modelled as having a volatility parent state, a value parent state, both, or none. When the node is orpean, it evolves as a Gaussian random walk around its previous value with fixed step size. Consequently, when inferring on the evolution of these states, the exact belief update equations (which include the computation of new predictions, posterior values, and prediction errors, and represent an approximate inversion of this generative model, see {cite:p}`2011:mathys` depend on the nature of the coupling of a given state with its parent and children states. In particular, the nodes that implement the belief updates will communicate with their value parents via value prediction errors, or **VAPE**s, and via volatility prediction errors, or **VOPE**s, with their volatility parents.
+Based on these principles, any given state in the world can be modelled as having a volatility parent state, a value parent state, both, or none. When the node is an orphan, it evolves as a Gaussian random walk around its previous value with a fixed step size. Consequently, when inferring on the evolution of these states, the exact belief update equations (which include the computation of new predictions, posterior values, and prediction errors, and represent an approximate inversion of this generative model, see {cite:p}`2011:mathys` depend on the nature of the coupling of a given state with its parent and children states. In particular, the nodes that implement the belief updates will communicate with their value parents via value prediction errors, or **VAPE**s, and via volatility prediction errors, or **VOPE**s, with their volatility parents.
 
 ```{figure} ../images/hgf.png
 ---
@@ -233,7 +227,7 @@ The two-level and three-level Hierarchical Gaussian Filters for binary or contin
 A one-level HGF for continuous input is a [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter).
 ```
 
-+++ {"user_expressions": []}
++++
 
 Still assuming that Node $i$ is the value child of Node $i+1$, the prediction step consists of the following computations:
 
@@ -276,25 +270,25 @@ $$
 
 Note that in this example, all states that are value parents of other states (or outcomes) have their own volatility parent, while states that are volatility parents to other nodes either have a value parent (as state $\check{x}_1$), or no parents (as states $\check{x}_2$ and $\check{x}_3$). This is deliberately so, and we will see these two motifs - every state of a hierarchy has its own volatility estimation, and volatility states only have value parents - reappear in the following chapters.
 
-+++ {"user_expressions": []}
++++
 
 ## Belief updates in the HGF: Computations of nodes
 
-+++ {"user_expressions": []}
++++
 
 ```{note}
 The update equations for volatility and value coupling in the generalized hierarchical Gaussian filter have been described in {cite:p}`weber:2023`.
 ```
 
-+++ {"user_expressions": []}
++++
 
-The coding examples introduced above illustrated generative models that can simulate data forward from a given volatility structure, with key parameters stochastically fluctuating. HGFs use this as a model of the environment to make sense of new observation, also refered as the sensory part of the HGF, or the filtering part. In this situation, new observation are coming in and the model has to update the volatility structure accordingly (from bottom to top nodes).
+The coding examples introduced above illustrated generative models that can simulate data forward from a given volatility structure, with key parameters stochastically fluctuating. HGFs use this as a model of the environment to make sense of new observations, also referred to as the sensory part of the HGF, or the filtering part. In this situation, new observations are coming in and the model has to update the volatility structure accordingly (from bottom to top nodes).
 
 In its first description, {cite:p}2011:mathys derived a set of simple, one-step update equations that represent changes in beliefs about the hidden states (i.e. the sufficient statistics of the nodes) specified in the generative model. For each state, a belief is held (and updated for every new input) by the agent and described as a Gaussian distribution, fully characterized by its mean
 and its inverse variance, or precision,
 on a given trial (this is the notation we have been using in the previous examples). We conceptualize each belief as a node in a network, where belief updates involve computations within nodes as well as message passing between nodes. The computations of any observation at each time point can be ordered in time as shown in the [belief update algorithm](#belief-update):
 
-+++ {"user_expressions": []}
++++
 
 ```{note} Belief update
 :name: belief-update
@@ -322,15 +316,15 @@ For $i$ a {term}`node` in a probabilistic network at time $k$, with children at
 
 ```
 
-+++ {"user_expressions": []}
++++ {"editable": true, "slideshow": {"slide_type": ""}}
 
 The exact computations in each step depend on the nature of the coupling (via {term}`VAPE`s vs. {term}`VOPE`s) between the parent and children nodes.
 
 ```{important}
-We have placed the {term}`Prediction` step in the end of the update loop. This is because usually, we think about the beginning of a timepoint trial as starting with receiving a new input, and of a prediction as being present before that input is received (this is especially relevant to model time points as trial in an experiemnts). However, in some variants of the HGF the prediction also depends on the time that has passed in between trials, which is something that can only be evaluated once the new input arrives - hence the additional computation of the (current) prediction in the beginning of the trial. Conceptually, it makes most sense to think of the prediction as happening continuously between trials. For implementational purposes, it is however most convenient to only compute the prediction once the new input (and with it its arrival time) enters. This ensures both that the posterior means of parent nodes have had enough time to be sent back to their children for preparation for the new input, and that the arrival time of the new input can be taken into account appropriately.
+We have placed the {term}`Prediction` step at the end of the update loop. This is because usually, we think about the beginning of a timepoint trial as starting with receiving a new input, and of a prediction as being present before that input is received (this is especially relevant to model time points as a trial in an experiment). However, in some variants of the HGF the prediction also depends on the time that has passed in between trials, which is something that can only be evaluated once the new input arrives - hence the additional computation of the (current) prediction at the beginning of the trial. Conceptually, it makes most sense to think of the prediction as happening continuously between trials. For implementational purposes, it is however most convenient to only compute the prediction once the new input (and with it its arrival time) enters. This ensures both that the posterior means of parent nodes have had enough time to be sent back to their children for preparation for the new input, and that the arrival time of the new input can be taken into account appropriately.
 ```
 
-+++ {"user_expressions": []}
++++
 
 ## Computations for VAPE coupling
 
@@ -367,7 +361,7 @@ All of these are available at the time of the update. Node $i$ therefore only ne
 
 ```
 
-+++ {"user_expressions": []}
++++
 
 ```{admonition} Prediction error
 :class: dropdown
@@ -387,7 +381,7 @@ $$
 
 ```
 
-+++ {"user_expressions": []}
++++
 
 ```{admonition} Prediction
 :class: dropdown
@@ -418,7 +412,7 @@ In general, the prediction of the mean will depend only on whether Node $i$ has
 Thus, the {ref}`vape-prediction` only depends on knowing the node's own posteriors and receiving the value parent's posterior in time before the new input arrives.
 ```
 
-+++ {"user_expressions": []}
++++
 
 ## Computations for VOPE coupling
 
@@ -467,7 +461,7 @@ $$
 
 which will be computed as part of the [vope preditions](#vope-prediction) and only serves to simplify the equations and the corresponding message passing.
 
-+++ {"user_expressions": []}
++++
 
 ```{admonition} Update
 :class: dropdown
@@ -523,7 +517,7 @@ Therefore, at the time of the update, Node $i$ needs to have access to the follo
 
 ```
 
-+++ {"user_expressions": []}
++++
 
 ```{admonition} Prediction-error
 :class: dropdown
@@ -545,7 +539,7 @@ $$
 
 ```
 
-+++ {"user_expressions": []}
++++
 
 ```{admonition} Prediction
 :class: dropdown
@@ -567,7 +561,7 @@ Thus, the prediction for trial $k+1$ depends again only on receiving the posteri
 Note that if Node $i$ additionally has a {term}`VAPE` parent node, the prediction of the new mean, $\hat{\mu}_i^{k+1}$ would also depend on the posterior mean of that value parent (cf. {ref}`vape-prediction`).
 ```
 
-+++ {"user_expressions": []}
++++
 
 ## Glossary
 
@@ -580,7 +574,7 @@ Prediction
     At every time $k$, a continuous node $i$ is defined by its sufficient statistics, the mean $\mu_i^{(k)}$ and its inverse variance, or precision, $\pi_i^{(k)}$, and hold predictions about the next observed values, denoted $\hat{\mu}_i^{(k)}$ and $\hat{\pi}_i^{(k)}$.
 
 Prediction error
-    Difference between the top-down predictions at node $i$ that is inherited from parents, and the bottom-up incoming observatrion passed by children nodes.
+    Difference between the top-down predictions at node $i$ that is inherited from parents, and the bottom-up incoming observations passed by children nodes.
 
 Update
     At each time $k$, a new value is observed at the input node and the sufficient statistics of the nodes (i.e. beliefs) are updated accordingly from the lower part to the upper part of the structure.