tumaer
diff --git a/‎_images/1f50f557496d0fb9c5b38a4bed0b874e67abff5fed7ae8b399d996d8f6c22494.png
58.8 KB b/‎_images/1f50f557496d0fb9c5b38a4bed0b874e67abff5fed7ae8b399d996d8f6c22494.png
58.8 KB
diff --git a/‎_images/25116f2172209ba9ed8231554c63660765b1510b6f1dbb06ca07226d698b9519.png
23.6 KB b/‎_images/25116f2172209ba9ed8231554c63660765b1510b6f1dbb06ca07226d698b9519.png
23.6 KB
diff --git a/‎_images/3fe9e36024586af07da99c14267a9e9f94e619515aad4ffbc454b01b9206df14.png
-23.1 KB b/‎_images/3fe9e36024586af07da99c14267a9e9f94e619515aad4ffbc454b01b9206df14.png
-23.1 KB
diff --git a/‎_images/4939435872ebadc30e88608184a09a560fa612a0669d12fc326712fed8338042.png
20.4 KB b/‎_images/4939435872ebadc30e88608184a09a560fa612a0669d12fc326712fed8338042.png
20.4 KB
diff --git a/‎_images/52e8153375c754a610eb2964889187a5ec5e6ab171b18470d0aa60f090553e5c.png
-16.8 KB b/‎_images/52e8153375c754a610eb2964889187a5ec5e6ab171b18470d0aa60f090553e5c.png
-16.8 KB
diff --git a/‎_images/5c9d855660dc6eae4dc54a2e40f31a2ca3484f4c01c4cc628ad42810781d1a7d.png
-38.9 KB b/‎_images/5c9d855660dc6eae4dc54a2e40f31a2ca3484f4c01c4cc628ad42810781d1a7d.png
-38.9 KB
diff --git a/‎_images/615416f6ad80ccb855d2a800759f1ff7a1f2f45aeabc4ffe0a4788a67ce359e0.png
16.6 KB b/‎_images/615416f6ad80ccb855d2a800759f1ff7a1f2f45aeabc4ffe0a4788a67ce359e0.png
16.6 KB
diff --git a/‎_images/80aa436491082478f4778cb05efd6cb9c64e135d2cfcafd3b0f6545e8f7b76b9.png
-13.2 KB b/‎_images/80aa436491082478f4778cb05efd6cb9c64e135d2cfcafd3b0f6545e8f7b76b9.png
-13.2 KB
diff --git a/‎_images/ac300ce31ab57fbbfac19ceca3be5eef7bde7e964026e40ddab6b2239cab6eae.png
40.2 KB b/‎_images/ac300ce31ab57fbbfac19ceca3be5eef7bde7e964026e40ddab6b2239cab6eae.png
40.2 KB
diff --git a/‎_images/ad_example_question.png
66.4 KB b/‎_images/ad_example_question.png
66.4 KB
diff --git a/‎_images/ad_example_solution.png
89.7 KB b/‎_images/ad_example_solution.png
89.7 KB
diff --git a/‎_images/d39a5b5fe66b8005caf314d6257537d197956cc01a3d7f7493f94252f1f385a9.png
-62.1 KB b/‎_images/d39a5b5fe66b8005caf314d6257537d197956cc01a3d7f7493f94252f1f385a9.png
-62.1 KB
diff --git a/‎_images/e97ae1782b02505b121745da84045bed08c03e8a2a505398b9beb6086f87d099.png
13.2 KB b/‎_images/e97ae1782b02505b121745da84045bed08c03e8a2a505398b9beb6086f87d099.png
13.2 KB
diff --git a/‎_images/eec83509f33a9c3089eec44771871814495de7d5dd6c9be29397493c80f65d7d.png
-17.6 KB b/‎_images/eec83509f33a9c3089eec44771871814495de7d5dd6c9be29397493c80f65d7d.png
-17.6 KB
diff --git a/‎_sources/exercise/gp.ipynb
Lines changed: 1077 additions & 119 deletions b/‎_sources/exercise/gp.ipynb
Lines changed: 1077 additions & 119 deletions
diff --git a/‎_sources/lecture/gp.md
Lines changed: 5 additions & 5 deletions b/‎_sources/lecture/gp.md
Lines changed: 5 additions & 5 deletions
diff --git a/‎_sources/lecture/gradients.md
Lines changed: 92 additions & 25 deletions b/‎_sources/lecture/gradients.md
Lines changed: 92 additions & 25 deletions
diff --git a/‎exercise/bayes.html
Lines changed: 2 additions & 2 deletions b/‎exercise/bayes.html
Lines changed: 2 additions & 2 deletions
diff --git a/‎exercise/gp.html
Lines changed: 1040 additions & 59 deletions b/‎exercise/gp.html
Lines changed: 1040 additions & 59 deletions
diff --git a/‎lecture/gp.html
Lines changed: 5 additions & 5 deletions b/‎lecture/gp.html
Lines changed: 5 additions & 5 deletions
@@ -643,13 +643,13 @@ $$
 Without discussing all the details, we will now briefly sketch how Gaussian Processes can be adapted to classification. Consider for simplicity the 2-class problem:
 
 $$
-0 < y < 1, \quad h(x) = \text{sigmoid}(\varphi(x))
+0 < y < 1, \quad h(x) = \text{sigmoid}(\varphi(x)).
 $$
 
-the PDF for $y$ conditioned on the feature $\varphi(x)$ then follows a Bernoulli-distribution:
+The PDF for $y$ conditioned on the feature $\varphi(x)$ then follows a Bernoulli-distribution:
 
 $$
-p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n
+p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n.
 $$
 
 Finding the predictive PDF for unseen data $p(y^{(n+1)}|y)$, given the training data
@@ -679,13 +679,13 @@ $$
 hence giving us
 
 $$
-p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I)
+p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I),
 $$
 
 where $K_{ij} = k(x^{(i)}, x^{(j)})$, i.e., a Grammian matrix generated by the kernel functions from the feature map $\varphi(x)$.
 
 > - Note that we do **NOT** include an explicit noise term in the data covariance as we assume that all sample data have been correctly classified.
-> - For numerical reasons, we introduce a noise-like form that improves the conditioning of $K + \mu I$
+> - For numerical reasons, we introduce a noise-like form $\nu I$ that improves the conditioning of $K$
 > - For two-class classification it is sufficient to predict $p(y^{(n+1)} = 1 | y)$ as $p(y^{(n+1)} = 0 | y) = 1 - p(y^{(n+1)} = 1 | y)$
 
 Using the PDF $p(y=1|\varphi) = \text{sigmoid}(\varphi(x))$ we obtain the predictive PDF:
 
@@ -664,8 +664,8 @@ <h3><span class="section-number">2.1.4. </span>Bayesian Linear Regression Model<
 <p>The model below essentially makes the following prior assumptions:</p>
 <div class="math notranslate nohighlight">
 \[y \approx h(x) = wx + b + \epsilon, \quad \text{with:}\]</div>
-<div class="amsmath math notranslate nohighlight" id="equation-0db22e97-6b0a-46e0-a476-53d8ecd3575d">
-<span class="eqno">(2.30)<a class="headerlink" href="#equation-0db22e97-6b0a-46e0-a476-53d8ecd3575d" title="Permalink to this equation">#</a></span>\[\begin{align}
+<div class="amsmath math notranslate nohighlight" id="equation-b60eee56-96e4-4e84-8e5b-6e3421e3a6c3">
+<span class="eqno">(2.30)<a class="headerlink" href="#equation-b60eee56-96e4-4e84-8e5b-6e3421e3a6c3" title="Permalink to this equation">#</a></span>\[\begin{align}
 y_i &amp;\sim \mathcal{N}(\mu, \sigma^2)\\
 \mu &amp;= w \cdot x_i + b\\
 w &amp;\sim \mathcal{N}(0,1^2)\\
 
@@ -999,12 +999,12 @@ <h2><span class="section-number">7.4. </span>GP for Classification<a class="head
 <p>Without discussing all the details, we will now briefly sketch how Gaussian Processes can be adapted to classification. Consider for simplicity the 2-class problem:</p>
 <div class="math notranslate nohighlight">
 \[
-0 &lt; y &lt; 1, \quad h(x) = \text{sigmoid}(\varphi(x))
+0 &lt; y &lt; 1, \quad h(x) = \text{sigmoid}(\varphi(x)).
 \]</div>
-<p>the PDF for <span class="math notranslate nohighlight">\(y\)</span> conditioned on the feature <span class="math notranslate nohighlight">\(\varphi(x)\)</span> then follows a Bernoulli-distribution:</p>
+<p>The PDF for <span class="math notranslate nohighlight">\(y\)</span> conditioned on the feature <span class="math notranslate nohighlight">\(\varphi(x)\)</span> then follows a Bernoulli-distribution:</p>
 <div class="math notranslate nohighlight">
 \[
-p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n
+p(y | \varphi) = \text{sigmoid}(\varphi)^{y} \left( 1 - \text{sigmoid}(\varphi) \right)^{1-y}, \quad i=1, \ldots, n.
 \]</div>
 <p>Finding the predictive PDF for unseen data <span class="math notranslate nohighlight">\(p(y^{(n+1)}|y)\)</span>, given the training data</p>
 <div class="math notranslate nohighlight">
@@ -1031,13 +1031,13 @@ <h2><span class="section-number">7.4. </span>GP for Classification<a class="head
 <p>hence giving us</p>
 <div class="math notranslate nohighlight">
 \[
-p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I)
+p(\tilde{\varphi}) = \mathcal{N}( \tilde{\varphi}; 0, K + \nu I),
 \]</div>
 <p>where <span class="math notranslate nohighlight">\(K_{ij} = k(x^{(i)}, x^{(j)})\)</span>, i.e., a Grammian matrix generated by the kernel functions from the feature map <span class="math notranslate nohighlight">\(\varphi(x)\)</span>.</p>
 <blockquote>
 <div><ul class="simple">
 <li><p>Note that we do <strong>NOT</strong> include an explicit noise term in the data covariance as we assume that all sample data have been correctly classified.</p></li>
-<li><p>For numerical reasons, we introduce a noise-like form that improves the conditioning of <span class="math notranslate nohighlight">\(K + \mu I\)</span></p></li>
+<li><p>For numerical reasons, we introduce a noise-like form <span class="math notranslate nohighlight">\(\nu I\)</span> that improves the conditioning of <span class="math notranslate nohighlight">\(K\)</span></p></li>
 <li><p>For two-class classification it is sufficient to predict <span class="math notranslate nohighlight">\(p(y^{(n+1)} = 1 | y)\)</span> as <span class="math notranslate nohighlight">\(p(y^{(n+1)} = 0 | y) = 1 - p(y^{(n+1)} = 1 | y)\)</span></p></li>
 </ul>
 </div></blockquote>