Skip to content

Commit f429452

Browse files
committed
Update documentation
1 parent 1476e50 commit f429452

23 files changed

+2314
-255
lines changed
Loading
Loading
Binary file not shown.
Loading
Binary file not shown.
Binary file not shown.
Loading
Binary file not shown.
Loading

_images/ad_example_question.png

66.4 KB
Loading

_images/ad_example_solution.png

89.7 KB
Loading
Binary file not shown.
Loading
Binary file not shown.

_sources/exercise/gp.ipynb

Lines changed: 1077 additions & 119 deletions
Large diffs are not rendered by default.

_sources/lecture/gp.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -643,13 +643,13 @@ $$
643643
Without discussing all the details, we will now briefly sketch how Gaussian Processes can be adapted to classification. Consider for simplicity the 2-class problem:
644644
645645
$$
646-
0 < y < 1, \quad h(x) = \text{sigmoid}(\varphi(x))
646+
0 < y < 1, \quad h(x) = \text{sigmoid}(\varphi(x)).
647647
$$
648648
649-
the PDF for $y$ conditioned on the feature $\varphi(x)$ then follows a Bernoulli-distribution:
649+
The PDF for $y$ conditioned on the feature $\varphi(x)$ then follows a Bernoulli-distribution:
650650
651651
$$
652-
p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n
652+
p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n.
653653
$$
654654
655655
Finding the predictive PDF for unseen data $p(y^{(n+1)}|y)$, given the training data
@@ -679,13 +679,13 @@ $$
679679
hence giving us
680680
681681
$$
682-
p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I)
682+
p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I),
683683
$$
684684
685685
where $K_{ij} = k(x^{(i)}, x^{(j)})$, i.e., a Grammian matrix generated by the kernel functions from the feature map $\varphi(x)$.
686686
687687
> - Note that we do **NOT** include an explicit noise term in the data covariance as we assume that all sample data have been correctly classified.
688-
> - For numerical reasons, we introduce a noise-like form that improves the conditioning of $K + \mu I$
688+
> - For numerical reasons, we introduce a noise-like form $\nu I$ that improves the conditioning of $K$
689689
> - For two-class classification it is sufficient to predict $p(y^{(n+1)} = 1 | y)$ as $p(y^{(n+1)} = 0 | y) = 1 - p(y^{(n+1)} = 1 | y)$
690690
691691
Using the PDF $p(y=1|\varphi) = \text{sigmoid}(\varphi(x))$ we obtain the predictive PDF:

_sources/lecture/gradients.md

Lines changed: 92 additions & 25 deletions
Large diffs are not rendered by default.

exercise/bayes.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -664,8 +664,8 @@ <h3><span class="section-number">2.1.4. </span>Bayesian Linear Regression Model<
664664
<p>The model below essentially makes the following prior assumptions:</p>
665665
<div class="math notranslate nohighlight">
666666
\[y \approx h(x) = wx + b + \epsilon, \quad \text{with:}\]</div>
667-
<div class="amsmath math notranslate nohighlight" id="equation-0db22e97-6b0a-46e0-a476-53d8ecd3575d">
668-
<span class="eqno">(2.30)<a class="headerlink" href="#equation-0db22e97-6b0a-46e0-a476-53d8ecd3575d" title="Permalink to this equation">#</a></span>\[\begin{align}
667+
<div class="amsmath math notranslate nohighlight" id="equation-b60eee56-96e4-4e84-8e5b-6e3421e3a6c3">
668+
<span class="eqno">(2.30)<a class="headerlink" href="#equation-b60eee56-96e4-4e84-8e5b-6e3421e3a6c3" title="Permalink to this equation">#</a></span>\[\begin{align}
669669
y_i &amp;\sim \mathcal{N}(\mu, \sigma^2)\\
670670
\mu &amp;= w \cdot x_i + b\\
671671
w &amp;\sim \mathcal{N}(0,1^2)\\

exercise/gp.html

Lines changed: 1040 additions & 59 deletions
Large diffs are not rendered by default.

lecture/gp.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -999,12 +999,12 @@ <h2><span class="section-number">7.4. </span>GP for Classification<a class="head
999999
<p>Without discussing all the details, we will now briefly sketch how Gaussian Processes can be adapted to classification. Consider for simplicity the 2-class problem:</p>
10001000
<div class="math notranslate nohighlight">
10011001
\[
1002-
0 &lt; y &lt; 1, \quad h(x) = \text{sigmoid}(\varphi(x))
1002+
0 &lt; y &lt; 1, \quad h(x) = \text{sigmoid}(\varphi(x)).
10031003
\]</div>
1004-
<p>the PDF for <span class="math notranslate nohighlight">\(y\)</span> conditioned on the feature <span class="math notranslate nohighlight">\(\varphi(x)\)</span> then follows a Bernoulli-distribution:</p>
1004+
<p>The PDF for <span class="math notranslate nohighlight">\(y\)</span> conditioned on the feature <span class="math notranslate nohighlight">\(\varphi(x)\)</span> then follows a Bernoulli-distribution:</p>
10051005
<div class="math notranslate nohighlight">
10061006
\[
1007-
p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n
1007+
p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n.
10081008
\]</div>
10091009
<p>Finding the predictive PDF for unseen data <span class="math notranslate nohighlight">\(p(y^{(n+1)}|y)\)</span>, given the training data</p>
10101010
<div class="math notranslate nohighlight">
@@ -1031,13 +1031,13 @@ <h2><span class="section-number">7.4. </span>GP for Classification<a class="head
10311031
<p>hence giving us</p>
10321032
<div class="math notranslate nohighlight">
10331033
\[
1034-
p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I)
1034+
p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I),
10351035
\]</div>
10361036
<p>where <span class="math notranslate nohighlight">\(K_{ij} = k(x^{(i)}, x^{(j)})\)</span>, i.e., a Grammian matrix generated by the kernel functions from the feature map <span class="math notranslate nohighlight">\(\varphi(x)\)</span>.</p>
10371037
<blockquote>
10381038
<div><ul class="simple">
10391039
<li><p>Note that we do <strong>NOT</strong> include an explicit noise term in the data covariance as we assume that all sample data have been correctly classified.</p></li>
1040-
<li><p>For numerical reasons, we introduce a noise-like form that improves the conditioning of <span class="math notranslate nohighlight">\(K + \mu I\)</span></p></li>
1040+
<li><p>For numerical reasons, we introduce a noise-like form <span class="math notranslate nohighlight">\(\nu I\)</span> that improves the conditioning of <span class="math notranslate nohighlight">\(K\)</span></p></li>
10411041
<li><p>For two-class classification it is sufficient to predict <span class="math notranslate nohighlight">\(p(y^{(n+1)} = 1 | y)\)</span> as <span class="math notranslate nohighlight">\(p(y^{(n+1)} = 0 | y) = 1 - p(y^{(n+1)} = 1 | y)\)</span></p></li>
10421042
</ul>
10431043
</div></blockquote>

0 commit comments

Comments
 (0)