Skip to content

Latest commit

 

History

History
93 lines (72 loc) · 3.64 KB

README.md

File metadata and controls

93 lines (72 loc) · 3.64 KB

Harmonic Lowering

The official implementation of harmonic convolution by Harmonic Lowering proposed in "Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals".

Note that this implementation is not the official one of the original paper.

Build

Requirements

Python 3.*
PyTorch newer than v1.0 with CUDA

Install

cd src
python setup.py install

Usage

You can easily replace normal convolution with harmonic convolution.

Harmonic Convolution example

Replace like below. Note that padding_mode is restricted to "zero" and padding[0] (freq axis padding) must be 0. The anchor parameter is default 1. The default of other parameters (stride, padding, dilation, groups, bias, padding_mode) is the same with Conv2d.

# import torch
# conv_module = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)
import harmonic_conv
conv_module = harmonic_conv.SingleHarmonicConv2d(in_channels, out_channels, kernel_size, anchor=1, stride, padding=(0,padding[1]), dilation, groups, bias, padding_mode="zero")

Logarithmic Harmonic Convolution example

Replace like below. out_log_scale (A), in_log_scale (B), radix (C) mean logarithmic function is f(x) = A log_C (Bx). Default radix is e (None).

# import torch
# conv_module = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias, padding_mode)
import harmonic_conv
conv_module = harmonic_conv.SingleLogHarmonicConv2d(in_channels, out_channels, kernel_size, out_log_scale=1000, in_log_scale=0.001, radix=None anchor=1, stride, padding=(0,padding[1]), dilation, groups, bias, padding_mode="zero")

Benchmark

Harmonic Lowering is faster computational method of harmonic convolution. Here are benchmarks and the tables of settings.

n Cin Cout S K P
Setting1 1 16 32 (256,256) (7,7) (3,3)
Setting2 1 16 32 (256,256) (5,5) (2,2)
Setting3 1 16 32 (256,256) (3,3) (1,1)
Setting1a 7 16 32 (256,256) (7,7) (3,3)
Setting2a 5 16 32 (256,256) (5,5) (2,2)
Setting3a 3 16 32 (256,256) (3,3) (1,1)

n Cin Cout S K P
Setting4 1 16 32 (512,512) (3,3) (1,1)
Setting5 1 16 32 (256,256) (3,3) (1,1)
Setting6 1 16 32 (128,128) (3,3) (1,1)
Setting7 1 16 32 (64,64) (3,3) (1,1)
Setting8 1 16 32 (32,32) (3,3) (1,1)
Setting9 1 16 32 (16,16) (3,3) (1,1)

These are measured in Nvidia GeForce GTX 1080Ti. Batch Size is 16, dilation=stride=groups=1. The parameters n, Cin, Cout, S, K, P in the above tables means anchor, input channel size, output channel size, input spectrogram (image) size, kernel size, padding size respectively.

Reference

If you use the code, please cite:

    @InProceedings{Hirotoshi_2020_Interspeech,
        author = {Hirotoshi, Takeuchi and Kunio, Kashio and Yasunori, Ohishi and Hiroshi, Saruwatari},
        title = {Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals},
        booktitle = {Interspeech},
        month = {},
        year = {2020}
    }

License

Check this file.