generated from virtual-labs/ph3-exp-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4 from virtual-labs/dev
Dev
- Loading branch information
Showing
32 changed files
with
2,132 additions
and
277 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
{ | ||
"unit-type": "lu", | ||
"label": "", | ||
"basedir": ".", | ||
"units": [ | ||
{ | ||
"unit-type": "aim" | ||
}, | ||
{ | ||
"target": "theory.html", | ||
"source": "theory.md", | ||
"label": "Theory", | ||
"unit-type": "task", | ||
"content-type": "text" | ||
}, | ||
{ | ||
"target": "objective.html", | ||
"source": "objective.md", | ||
"label": "Objective", | ||
"unit-type": "task", | ||
"content-type": "text" | ||
}, | ||
{ | ||
"target": "procedure.html", | ||
"source": "procedure.md", | ||
"label": "Procedure", | ||
"unit-type": "task", | ||
"content-type": "text" | ||
}, | ||
{ | ||
"target": "simulation.html", | ||
"source": "simulation/index.html", | ||
"label": "Simulation", | ||
"unit-type": "task", | ||
"content-type": "simulation" | ||
}, | ||
{ | ||
"target": "assignment.html", | ||
"source": "assignment.md", | ||
"label": "Assignment", | ||
"unit-type": "task", | ||
"content-type": "text" | ||
}, | ||
{ | ||
"target": "references.html", | ||
"source": "references.md", | ||
"label": "References", | ||
"unit-type": "task", | ||
"content-type": "text" | ||
} | ||
] | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,6 @@ | ||
### Aim of the experiment | ||
One major problem with standard N-gram models is that they must be trained from some corpus, and because any particular training corpus is finite, some perfectly acceptable N-grams are bound to be missing from it. We can see that bigram matrix for any given training corpus is sparse. There are large number of cases with zero probabilty bigrams and that should really have some non-zero probability. This method tend to underestimate the probability of strings that happen not to have occurred nearby in their training corpus. | ||
|
||
There are some techniques that can be used for assigning a non-zero probabilty to these 'zero probability bigrams'. This task of reevaluating some of the zero-probability and low-probabilty N-grams, and assigning them non-zero values, is called smoothing. | ||
|
||
<img src="images/a.jpg"> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
**Q1**. Add-one smoothing works horribly in practice because of giving too much probability mass to unseen n-grams. Prove using an example. | ||
|
||
**Q2**. In Add-δ smoothing, we add a small value 'δ' to the counts instead of one. Apply Add-δ smoothing to the below bigram count table where δ=0.02. | ||
|
||
| |(eos)|John|Read|Fountainhead|Mary|a|Different|Book|She|By|Dickens| | ||
|---|---|---|---|---|---|---|---|---|---|---|---| | ||
|(eos)|0 |300 |0 |0 |300 |0 |0 |0 |300 |0 |0 | | ||
|John |0 |0 |300 |0 |0 |0 |0 |0 |0 |0 |0 | | ||
|Read |0 |0 |0 |300 |0 |600 |0 |0 |0 |0 |0 | | ||
|Fountainhead|300 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 | | ||
|Mary |0 |0 |300|0 |0 |0 |0 |0 |0 |0 |0 | | ||
|a | 0 |0 |0 |0 |0 |0 |300 |300 |0 |0 |0 | | ||
|Different|0 |0 |0 |0 |0 |0 |0 |300 |0 |0 |0 | | ||
|Book |300 |0 |0 |0 |0 |0 |0 |0 |0 |300 |0 | | ||
|She|0 |0 |0 |300 |0 |0 |0 |0 |0 |0 |0 |0 | | ||
|By |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |300 | | ||
|Dickens|300 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 | | ||
|
||
N = 5100 V = 11 | ||
|
||
|
||
Q3. Given S = Dickens read a book, find P(S)</br> | ||
**(a)** Using unsmoothed probability</br> | ||
**(b)** Applying Add-One smoothing.</br> | ||
**(c)** Applying Add-δ smoothing</br> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
## Experiment name | ||
## N-Grams Smoothing |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
- The objective of this experiment is to learn how to apply add-one smoothing on sparse bigram table. |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,5 @@ | ||
### Procedure | ||
STEP1: Select a corpus | ||
|
||
STEP2: Apply add one smoothing and calculate bigram probabilities using the given bigram counts,N and V. Fill the table and hit `Submit` | ||
|
||
STEP3: If incorrect (red), see the correct answer by clicking on show answer or repeat Step 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
### Link your references in here | ||
**Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition** </br> | ||
BY: Daniel Jurafsky and James H. Martin - Chapter 6 |
Oops, something went wrong.