feat: some more clarifications

AshishKumar4 · Jul 6, 2024 · cd24183 · cd24183
1 parent 86908a7
commit cd24183
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/example notebooks/ddpm + ddim flax.ipynb b/example notebooks/ddpm + ddim flax.ipynb
@@ -22,9 +22,11 @@
     "\n",
     "### How do we do this?\n",
     "\n",
-    "The way score based models tackle this is that they train a model that, given a sample from some time step in the diffusion process, can predict how much noise is still present in it aka predict the time step in a way. Grossly oversimplified, we can basically then find the gradient of the model's output (or score) with respect to the input, and kind of do a gradient descent to remove the noise. \n",
+    "The way score based models tackle this is that they train a model that, given a sample from some time step in the diffusion process, can predict how much noise is still present in it aka predict the time step in a way. Grossly oversimplified, we can basically then find the gradient of the model's output (or score) with respect to the input, and kind of do a gradient descent to remove the noise. Thus, the model is trained to estimate the gradient of the log likelihood at the current sample.\n",
     "\n",
-    "And the way diffusion models work is by simply training a model that, given an intermediate noised sample, try to denoise the sample a bit at each time step. The time step itself can also be provided to the model to assist it in the process. It has been shown that both score based and diffusion principles are equivalent.\n",
+    "This gradient basically points us in the direction of the data distribution from the current sample, and we can then take a step in that direction to remove some noise. This process is repeated until we reach the final time step where we have ideally no noise left. This is the basic idea behind score based models.\n",
+    "\n",
+    "And the way diffusion models work is by simply training a model that, given an intermediate noised sample, try to denoise the sample a bit at each time step in an image-to-image manner rather than explicitly learning to estimate the gradient. The time step itself can also be provided to the model to assist it in the process. It has been shown that both score based and diffusion principles are equivalent.\n",
     "\n",
     "But we need to train the model as well right? How do we do that? Well its pretty intuitive from what we have understood so far: We can simply take samples from our actual data distribution, and construct a set of samples with gradually increasing noise levels by adding gaussian noise at each time step, until we basically have a set of samples that are just noise. This gradual noising process is very similar to the process of diffusion from thermodynamics where particles move from low entropy to high entropy, hence the name. Score based models just stumbled upon the same idea independently.\n",
     "\n",