Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yuguochencuc authored Oct 22, 2021
1 parent 238d182 commit 6d2e162
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ This is the repo of the manuscript "Dual-branch Attention-In-Attention Transform
The source code will be released soon!


Abstract:Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer-based module dubbed DB-AIAT to handle both coarse- and fine-grained regions of spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to estimate the overall spectral magnitude, while a complex refining branch is designed to compensate for the missing complex spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional network for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, which can capture long-term time-frequency dependencies and further aggregate global hierarchical contextual information. The experimental results on VoiceBank + Demand dataset show that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 94.7% STOI and 10.79dB SSNR) over previous advanced systems with a relatively light model size (2.81M).
Abstract:Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer-based module dubbed DB-AIAT to handle both coarse- and fine-grained regions of spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to estimate the overall spectral magnitude, while a complex refining branch is designed to compensate for the missing complex spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional network for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, which can capture long-term time-frequency dependencies and further aggregate global hierarchical contextual information. The experimental results on VoiceBank + Demand dataset show that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively light model size (2.81M).

### Network architecture:

Expand All @@ -14,11 +14,11 @@ Abstract:Curriculum learning begins to thrive in the speech enhancement area,

### Comparison with SOTA:

![Experimental resulst](https://user-images.githubusercontent.com/51236251/134447067-15506636-9dbb-426f-894c-eafcf28940a3.PNG)
![image](https://user-images.githubusercontent.com/51236251/138376964-86f1b0b5-9564-4ca4-a536-5b125e462809.png)

### Ablation study:

![image](https://user-images.githubusercontent.com/51236251/134447114-74429af8-7c10-465d-8a1e-ddf0ec636b4e.png)
![image](https://user-images.githubusercontent.com/51236251/138376989-a773f56e-a124-4b5a-830a-13f8ac608a8c.png)

![image](https://user-images.githubusercontent.com/51236251/135372322-c0968258-6935-4f8e-bcf6-7d303c310d04.png)

Expand Down

0 comments on commit 6d2e162

Please sign in to comment.