Merge pull request #4 from virtual-labs/dev

Dev
virtual-labs · Oct 11, 2021 · 2ff94e1 · 2ff94e1
2 parents f12812a + 4fe4a44
commit 2ff94e1
Show file tree

Hide file tree

Showing 32 changed files with 2,132 additions and 277 deletions.
diff --git a/experiment-descriptor.json b/experiment-descriptor.json
@@ -0,0 +1,53 @@
+{
+  "unit-type": "lu",
+  "label": "",
+  "basedir": ".",
+  "units": [
+    {
+      "unit-type": "aim"
+    },
+    {
+      "target": "theory.html",
+      "source": "theory.md",
+      "label": "Theory",
+      "unit-type": "task",
+      "content-type": "text"
+    },
+    {
+      "target": "objective.html",
+      "source": "objective.md",
+      "label": "Objective",
+      "unit-type": "task",
+      "content-type": "text"
+    },
+    {
+      "target": "procedure.html",
+      "source": "procedure.md",
+      "label": "Procedure",
+      "unit-type": "task",
+      "content-type": "text"
+    },
+    {
+      "target": "simulation.html",
+      "source": "simulation/index.html",
+      "label": "Simulation",
+      "unit-type": "task",
+      "content-type": "simulation"
+    },
+    {
+      "target": "assignment.html",
+      "source": "assignment.md",
+      "label": "Assignment",
+      "unit-type": "task",
+      "content-type": "text"
+    },
+    {
+      "target": "references.html",
+      "source": "references.md",
+      "label": "References",
+      "unit-type": "task",
+      "content-type": "text"
+    }
+  ]
+}
+
diff --git a/experiment/aim.md b/experiment/aim.md
@@ -1 +1,6 @@
-### Aim of the experiment
+One major problem with standard N-gram models is that they must be trained from some corpus, and because any particular training corpus is finite, some perfectly acceptable N-grams are bound to be missing from it. We can see that bigram matrix for any given training corpus is sparse. There are large number of cases with zero probabilty bigrams and that should really have some non-zero probability. This method tend to underestimate the probability of strings that happen not to have occurred nearby in their training corpus.
+
+There are some techniques that can be used for assigning a non-zero probabilty to these 'zero probability bigrams'. This task of reevaluating some of the zero-probability and low-probabilty N-grams, and assigning them non-zero values, is called smoothing.
+
+<img src="images/a.jpg">
+
diff --git a/experiment/assignment.md b/experiment/assignment.md
@@ -0,0 +1,25 @@
+**Q1**. Add-one smoothing works horribly in practice because of giving too much probability mass to unseen n-grams. Prove using an example.
+
+**Q2**. In Add-&delta; smoothing, we add a small value '&delta;' to the counts instead of one. Apply Add-&delta; smoothing to the below bigram count table where &delta;=0.02.
+
+|   |(eos)|John|Read|Fountainhead|Mary|a|Different|Book|She|By|Dickens|
+|---|---|---|---|---|---|---|---|---|---|---|---|
+|(eos)|0   |300 |0   |0   |300	|0   |0   |0   |300   |0   |0   |
+|John |0   |0   |300 |0   |0   |0   |0   |0   |0   |0   |0   |
+|Read |0   |0   |0   |300 |0   |600 |0   |0   |0   |0   |0   |
+|Fountainhead|300 |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |
+|Mary |0   |0   |300|0   |0   |0   |0   |0   |0   |0   |0   |
+|a | 0  |0   |0   |0   |0   |0   |300 |300 |0   |0   |0   |
+|Different|0   |0   |0   |0   |0   |0   |0   |300 |0   |0   |0   |
+|Book |300 |0   |0   |0   |0   |0   |0   |0   |0   |300 |0   |
+|She|0   |0   |0   |300 |0   |0   |0   |0   |0   |0   |0   |0   |
+|By |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |300 |
+|Dickens|300 |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |
+
+N = 5100 V = 11
+
+
+Q3. Given S = Dickens read a book, find P(S)</br>
+**(a)** Using unsmoothed probability</br>
+**(b)** Applying Add-One smoothing.</br>
+**(c)** Applying Add-&delta; smoothing</br>
diff --git a/experiment/experiment-name.md b/experiment/experiment-name.md
@@ -1 +1 @@
-## Experiment name
+## N-Grams Smoothing
diff --git a/experiment/images/a.jpg b/experiment/images/a.jpg
diff --git a/experiment/objective.md b/experiment/objective.md
@@ -0,0 +1 @@
+- The objective of this experiment is to learn how to apply add-one smoothing on sparse bigram table.
diff --git a/experiment/posttest.js b/experiment/posttest.js
diff --git a/experiment/pretest.js b/experiment/pretest.js
diff --git a/experiment/procedure.md b/experiment/procedure.md
@@ -1 +1,5 @@
-### Procedure
+STEP1: Select a corpus
+
+STEP2: Apply add one smoothing and calculate bigram probabilities using the given bigram counts,N and V. Fill the table and hit `Submit`
+
+STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2
diff --git a/experiment/references.md b/experiment/references.md
@@ -1 +1,2 @@
-### Link your references in here
+**Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition** </br>
+BY: Daniel Jurafsky and James H. Martin - Chapter 6
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		- The objective of this experiment is to learn how to apply add-one smoothing on sparse bigram table.