Skip to content

Commit c2c2b4e

Browse files
authored
Add Table of Contents and Instructions
1 parent bf1a905 commit c2c2b4e

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

README.md

+7
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,12 @@ Both algorithms have achieved very similar cross-validation scores, so we can co
1111
#### 4. How would you compare selected classification methods if the dataset was imbalanced?
1212
If the frequency of label samples in the dataset were imbalaced, then I would have to use a performance metric that is capable of handling such situation. A basic accepted approach is to take [Precision and Recall](https://en.wikipedia.org/wiki/Precision_and_recall) metrics (two ratios of True Positive predictions for each label). If it were to be appropriate to give equal importance to the two, then they would be combined into a one score by using a harmonic mean (i.e. the [F1-score](https://en.wikipedia.org/wiki/F1_score)). This would constitute a proper handling of an imbalanced dataset.
1313

14+
# Project Structure and Instructions
15+
Runnables are available in these folders:
16+
- `notebook` - Detailed exploration steps and performance evaluation.
17+
- `src/modeling` - Run the public scripts in this folder to train the models.
18+
19+
Install the required dependencies by running `pip install -r requirements.txt` in the shell.
20+
1421
# Dataset
1522
[sentence polarity dataset v1.0](https://www.cs.cornell.edu/people/pabo/movie-review-data/) (includes sentence polarity dataset README v1.0): 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.

0 commit comments

Comments
 (0)