Skip to content

Commit

Permalink
Merge pull request #4 from virtual-labs/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
sravanthimodepu authored Oct 11, 2021
2 parents f12812a + 4fe4a44 commit 2ff94e1
Show file tree
Hide file tree
Showing 32 changed files with 2,132 additions and 277 deletions.
53 changes: 53 additions & 0 deletions experiment-descriptor.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{
"unit-type": "lu",
"label": "",
"basedir": ".",
"units": [
{
"unit-type": "aim"
},
{
"target": "theory.html",
"source": "theory.md",
"label": "Theory",
"unit-type": "task",
"content-type": "text"
},
{
"target": "objective.html",
"source": "objective.md",
"label": "Objective",
"unit-type": "task",
"content-type": "text"
},
{
"target": "procedure.html",
"source": "procedure.md",
"label": "Procedure",
"unit-type": "task",
"content-type": "text"
},
{
"target": "simulation.html",
"source": "simulation/index.html",
"label": "Simulation",
"unit-type": "task",
"content-type": "simulation"
},
{
"target": "assignment.html",
"source": "assignment.md",
"label": "Assignment",
"unit-type": "task",
"content-type": "text"
},
{
"target": "references.html",
"source": "references.md",
"label": "References",
"unit-type": "task",
"content-type": "text"
}
]
}

7 changes: 6 additions & 1 deletion experiment/aim.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
### Aim of the experiment
One major problem with standard N-gram models is that they must be trained from some corpus, and because any particular training corpus is finite, some perfectly acceptable N-grams are bound to be missing from it. We can see that bigram matrix for any given training corpus is sparse. There are large number of cases with zero probabilty bigrams and that should really have some non-zero probability. This method tend to underestimate the probability of strings that happen not to have occurred nearby in their training corpus.

There are some techniques that can be used for assigning a non-zero probabilty to these 'zero probability bigrams'. This task of reevaluating some of the zero-probability and low-probabilty N-grams, and assigning them non-zero values, is called smoothing.

<img src="images/a.jpg">

25 changes: 25 additions & 0 deletions experiment/assignment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
**Q1**. Add-one smoothing works horribly in practice because of giving too much probability mass to unseen n-grams. Prove using an example.

**Q2**. In Add-&delta; smoothing, we add a small value '&delta;' to the counts instead of one. Apply Add-&delta; smoothing to the below bigram count table where &delta;=0.02.

| |(eos)|John|Read|Fountainhead|Mary|a|Different|Book|She|By|Dickens|
|---|---|---|---|---|---|---|---|---|---|---|---|
|(eos)|0 |300 |0 |0 |300 |0 |0 |0 |300 |0 |0 |
|John |0 |0 |300 |0 |0 |0 |0 |0 |0 |0 |0 |
|Read |0 |0 |0 |300 |0 |600 |0 |0 |0 |0 |0 |
|Fountainhead|300 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |
|Mary |0 |0 |300|0 |0 |0 |0 |0 |0 |0 |0 |
|a | 0 |0 |0 |0 |0 |0 |300 |300 |0 |0 |0 |
|Different|0 |0 |0 |0 |0 |0 |0 |300 |0 |0 |0 |
|Book |300 |0 |0 |0 |0 |0 |0 |0 |0 |300 |0 |
|She|0 |0 |0 |300 |0 |0 |0 |0 |0 |0 |0 |0 |
|By |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |300 |
|Dickens|300 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |

N = 5100 V = 11


Q3. Given S = Dickens read a book, find P(S)</br>
**(a)** Using unsmoothed probability</br>
**(b)** Applying Add-One smoothing.</br>
**(c)** Applying Add-&delta; smoothing</br>
2 changes: 1 addition & 1 deletion experiment/experiment-name.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
## Experiment name
## N-Grams Smoothing
Binary file added experiment/images/a.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions experiment/objective.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- The objective of this experiment is to learn how to apply add-one smoothing on sparse bigram table.
135 changes: 0 additions & 135 deletions experiment/posttest.js

This file was deleted.

135 changes: 0 additions & 135 deletions experiment/pretest.js

This file was deleted.

6 changes: 5 additions & 1 deletion experiment/procedure.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
### Procedure
STEP1: Select a corpus

STEP2: Apply add one smoothing and calculate bigram probabilities using the given bigram counts,N and V. Fill the table and hit `Submit`

STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2
3 changes: 2 additions & 1 deletion experiment/references.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
### Link your references in here
**Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition** </br>
BY: Daniel Jurafsky and James H. Martin - Chapter 6
Loading

0 comments on commit 2ff94e1

Please sign in to comment.