Skip to content

Commit

Permalink
📖 [Update] README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LuluW8071 committed Aug 26, 2024
1 parent 6b631fe commit 0fa9613
Showing 1 changed file with 91 additions and 68 deletions.
159 changes: 91 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,19 @@ This project aims to develop an advanced system that integrates Automatic Speech

## Intermediate Goals

- [x] Baseline Model for ASR
- [x] Baseline Model for SER
- [x] Baseline Model for Text Summarizer
- [x] **Baseline Model for ASR:** CNN-BiLSTM
- [x] **Baseline Model for SER:** XGBoost
- [x] **Baseline Model for Text Summarizer:** T5-Small, T5-Base
- [x] **Final Model for ASR:** Conformer
- [ ] **Final Model for SER**
- [x] **Final Model for Text Summarizer:** BART Large

## Goals

- [ ] Accurate ASR system for diverse accents and operable in noisy environments.
- [ ] Emotion Analysis through tone of speech.
- [ ] Meaningful Text Summarizer without loss of critical information.
- [ ] Integrating each component into one cohesive system provides real-time transcription and summaries.
- [ ] **Accurate ASR System:** Handle diverse accents and operate effectively in noisy environments.
- [ ] **Emotion Analysis:** Through tone of speech.
- [ ] **Meaningful Text Summarizer:** Preserve critical information without loss.
- [ ] **Integrated System:** Combine all components to provide real-time transcription and summaries.

## Contributors <img src="https://user-images.githubusercontent.com/74038190/213844263-a8897a51-32f4-4b3b-b5c2-e1528b89f6f3.png" width="25px" />
<a href="https://github.com/LuluW8071/ASR-with-Speech-Sentiment-and-Text-Summarizer/graphs/contributors">
Expand Down Expand Up @@ -79,7 +82,9 @@ If you have other packages installed in the environment that are no longer neede

#### 1. Install Required Dependencies

Before installing dependencies from `requirements.txt`, make sure you have installed
> [!IMPORTANT]
> Before installing dependencies from `requirements.txt`, make sure you have installed
- [**CUDA ToolKit v11.8/12.1**](https://developer.nvidia.com/cuda-toolkit-archive)
- [**PyTorch**](https://pytorch.org/)
- [**SOX**](https://sourceforge.net/projects/sox/)
Expand All @@ -99,7 +104,8 @@ pip install -r requirements.txt

#### 2. Configure [**Comet-ML**](https://www.comet.com/site/) Integration

Replace `dummy_key` with your actual Comet-ML API key and project name in the `.env` file to enable real-time loss curve plotting, system metrics tracking, and confusion matrix visualization.
> [!NOTE]
> Replace `dummy_key` with your actual Comet-ML API key and project name in the `.env` file to enable real-time loss curve plotting, system metrics tracking, and confusion matrix visualization.
```python
API_KEY = "dummy_key"
Expand All @@ -108,101 +114,118 @@ PROJECT_NAME = "dummy_key"

## Usage Instructions

### ASR

1. Audio Conversion
> `--not-convert` if you don't want audio conversion
```bash
py common_voice.py --file_path "file_path/to/validated.tsv"
--save_json_path "file_path/to/save/json"
-w 4
--percent 10
--output_format 'wav' or 'flac'
```

2. Train Model
> `--checkpoint_path "path/to/checkpoint_file"` to load pre-trained model and fine tune on it.
```bash
py train.py --train_json "path/to/train.json"
--valid_json "path/to/test.json"
-w 4
--batch_size 128
-lr 2e-4
--epochs 20
```
### ASR (Automatic Speech Recognition)

#### 1. Audio Conversion

> [!NOTE]
> `--not-convert` if you don't want audio conversion
```bash
py common_voice.py --file_path "file_path/to/validated.tsv" \
--save_json_path "file_path/to/save/json" \
-w 4 \
--percent 10 \
--output_format 'wav' or 'flac'
```

#### 2. Train Model

> [!NOTE]
> `--checkpoint_path "path/to/checkpoint_file"` to load pre-trained model and fine tune on it.
```bash
py train.py --train_json "path/to/train.json" \
--valid_json "path/to/test.json" \
-w 4 \
--batch_size 128 \
-lr 2e-4 \
--epochs 20
```

#### 3. Sentence Extraction

```bash
py extract_sentence.py --file_path "file_path/to/validated.tsv" \
--save_txt_path "file_path/to/save/json"
```

### Speech Sentiment

1. Audio Downsample and Augment
#### 1. Audio Downsample and Augment

Run the `Speech_Sentiment.ipynb` first to get the *path* and *emotions* table in csv format and downsample all clips.
> [!NOTE]
> Run the `Speech_Sentiment.ipynb` first to get the *path* and *emotions* table in csv format and downsample all clips.
```bash
py downsample.py --file_path "path/to/audio_file.csv"
--save_csv_path "output/path"
-w 4
--output_format 'wav' or 'flac'
```
```bash
py downsample.py --file_path "path/to/audio_file.csv" \
--save_csv_path "output/path" \
-w 4 \
--output_format 'wav/flac'
```

```bash
py augment.py --file_path "path/to/emotion_dataset.csv"
--save_csv_path "output/path"
-w 4
--percent 20
```
```bash
py augment.py --file_path "path/to/emotion_dataset.csv" \
--save_csv_path "output/path" \
-w 4 \
--percent 20
```

2. Train the model
#### 2. Train the Model

```bash
py neuralnet/train.py --train_csv "path/to/train.csv"
--test_csv "path/to/test.csv"
-w 4
--batch_size 256
--epochs 25
-lr 1e-3
```
```bash
py neuralnet/train.py --train_csv "path/to/train.csv" \
--test_csv "path/to/test.csv" \
-w 4 \
--batch_size 256 \
--epochs 25 \
-lr 1e-3
```

### Text Summarization

> [!NOTE]
> Just run the Notebook File in `src/Text_Summarizer` directory.
> **Note:** You may need 🤗 Hugging Face Token with write permission file to upload your trained model directly on the 🤗 HF hub.
> You may need 🤗 Hugging Face Token with write permission file to upload your trained model directly on the 🤗 HF hub.
1. To Export hugging face models to ONNX runtime
<!-- 1. To Export hugging face models to ONNX runtime
> Example
```bash
optimum-cli export onnx --model luluw/t5-base-finetuned-billsum base_onnx/
!python3 -m optimum.exporters.onnx --model=luluw/t5-base-finetuned-billsum base-onnx/
```
``` -->

# Data Source

| Project | Dataset Source |
|--------------------|-------------------------------------------|
| ASR | [Mozilla Common Voice](https://commonvoice.mozilla.org/en/datasets) |
| Speech Sentiment | [RAVDESS](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio), [CremaD](https://www.kaggle.com/datasets/ejlok1/cremad), [TESS](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess), [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee) |
| Text Summarizer | [XSum](https://huggingface.co/datasets/EdinburghNLP/xsum), [BillSum](https://huggingface.co/datasets/FiscalNote/billsum) |
| Project | Dataset Source | |
|--------------------|-------------------------------------------|-|
| __ASR__ | [Mozilla Common Voice](https://commonvoice.mozilla.org/en/datasets) | <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS1rPYa2Q9zPtwLUeZJP3pWeNwmJjRpcLlpdQ&s" width="30px" /> |
| __SER__ | [RAVDESS](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio), [CremaD](https://www.kaggle.com/datasets/ejlok1/cremad), [TESS](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess), [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee) | <img src="https://go-skill-icons.vercel.app/api/icons?i=kaggle" width="30px"/> |
| __Text Summarizer__ | [XSum](https://huggingface.co/datasets/EdinburghNLP/xsum), [BillSum](https://huggingface.co/datasets/FiscalNote/billsum) | <img src="https://go-skill-icons.vercel.app/api/icons?i=hf" width="30px"/> |

## Code Structure

The code styling adheres to `autopep8` formatting.

## Artifacts Location

# Results

| Project | Base Model Link | Final Model Link |
|--------------------|---------------------------------------|---------------------|
| ASR | [CNN-BiLSTM](https://img.shields.io/badge/status-in_progress-red.svg) | ![Train in Progress](https://img.shields.io/badge/status-in_progress-red.svg) |
| Speech Sentiment | [XGBoost](https://img.shields.io/badge/status-in_progress-red.svg) | ![Train in Progress](https://img.shields.io/badge/status-in_progress-red.svg) |
| Text Summarizer | [T5 Small-FineTune](https://huggingface.co/luluw/t5-small-finetuned-xsum), [T5 Base-FineTune](https://huggingface.co/luluw/t5-base-finetuned-billsum) | ![Train in Progress](https://img.shields.io/badge/status-in_progress-red.svg) |
| __ASR__ | [CNN-BiLSTM](https://img.shields.io/badge/status-in_progress-red.svg) | [Conformer](https://img.shields.io/badge/status-in_progress-red.svg) |
| __SER__ | [XGBoost](https://img.shields.io/badge/status-in_progress-red.svg) | ![Train in Progress](https://img.shields.io/badge/status-in_progress-red.svg) |
| __Text Summarizer__ | [T5 Small-FineTune](https://huggingface.co/luluw/t5-small-finetuned-xsum), [T5 Base-FineTune](https://huggingface.co/luluw/t5-base-finetuned-billsum) | [BART](https://img.shields.io/badge/status-in_progress-red.svg) |


## Metrics Used

| Project | Metrics Used |
|--------------------|---------------------------------------|
| ASR | WER, CER |
| Speech Sentiment | Accuracy, F1-Score, Precision, Recall |
| Text Summarizer | Rouge1, Rouge2, Rougel, Rougelsum, Gen Len |
| __ASR__ | WER, CER |
| __SER__ | Accuracy, F1-Score, Precision, Recall |
| __Text Summarizer__ | Rouge1, Rouge2, Rougel, Rougelsum, Gen Len |

### Loss Curve Evaluation

Expand Down

0 comments on commit 0fa9613

Please sign in to comment.