Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSP optimizations #3

Open
2 tasks
danomatika opened this issue Nov 19, 2020 · 7 comments
Open
2 tasks

DSP optimizations #3

danomatika opened this issue Nov 19, 2020 · 7 comments
Labels
feature New feature or request optimization

Comments

@danomatika
Copy link
Contributor

danomatika commented Nov 19, 2020

This external performs DSP convolution by basically doing a lot of matrix math on a 368x2x128 data set and can get a bit CPU heavy when using multiple objects.

Some optimizations could be:

  • basic: change the main for loops to use pointer incrementation as opposed to indexing
  • advanced: platform/arch-specfic optimized matrix & vector math libs (for Apple platforms, the Accelerate framework)

ping @SylvainPDR

@danomatika danomatika added the feature New feature or request label Nov 19, 2020
@chikashimiyama
Copy link
Collaborator

chikashimiyama commented Mar 30, 2021

vDSP for mac, MTL for win, KissFFT for linux? platform/arch-specific approach costs a lot of dev time.
Platform independent options:
fftw http://fftw.org/
kissFFT could be the easiest though.

@danomatika
Copy link
Contributor Author

Whatever is cross-platform and easiest. Pierre (intern) looked into this and said there wasn't much he could change to make it faster, but it's worth a second look just in case.

@chikashimiyama
Copy link
Collaborator

ok.

@chikashimiyama
Copy link
Collaborator

chikashimiyama commented Mar 30, 2021

earplug/earplug~.c

Lines 151 to 178 in 4ac1938

while (blocksize--)
{
convSum[0] = 0;
convSum[1] = 0;
inSample = *(in++);
x->convBuffer[x->bufferPin] = inSample;
unsigned scaledBlocksize = blocksize * blockScale;
unsigned blocksizeDelta = 8191 - scaledBlocksize;
for (i = 0; i < 128; i++)
{
convSum[0] += (x->previousImpulse[0][i] * x->crossCoef[blocksizeDelta] +
x->currentImpulse[0][i] * x->crossCoef[scaledBlocksize]) *
x->convBuffer[(x->bufferPin - i) &127];
convSum[1] += (x->previousImpulse[1][i] * x->crossCoef[blocksizeDelta] +
x->currentImpulse[1][i] * x->crossCoef[scaledBlocksize]) *
x->convBuffer[(x->bufferPin - i) &127];
x->previousImpulse[0][i] = x->currentImpulse[0][i];
x->previousImpulse[1][i] = x->currentImpulse[1][i];
}
x->bufferPin = (x->bufferPin + 1) & 127;
*left_out++ = convSum[0];
*right_out++ = convSum[1];
}
return w + 6;

convolution code

@chikashimiyama
Copy link
Collaborator

chikashimiyama commented Mar 31, 2021

L. 170 - 171 is totally redundant code. and possibly this is not intended. since this is done in the nested loop of while(block size--) and for. the intended crossfading may be not working at all.

@chikashimiyama
Copy link
Collaborator

@danomatika

I tried frequency domain convolution using uFFT but the result is not very good (not so optimized significantly and generates some artifacts). see the optimization branch.
I can investigate more but I'm not sure if I should use more time for this...

@danomatika
Copy link
Contributor Author

This refers to PR #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request optimization
Projects
None yet
Development

No branches or pull requests

2 participants