-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism for image stacks #21
Comments
I think you've gotten the main route to parallelization. We do the same thing internally. The 250% you see is really due to FFTW. With the new threading architecture in 1.3 and updates to FFTW to use it, the two strategies should play together nicely. |
Actually, one thing you can do is profile to find out how much of the time is spent in |
Wow, that's interesting. The RegisterQD.jl/src/translations.jl Lines 22 to 24 in e22e980
You could try passing in |
I tried I'm trying to write something fairly generic, so can't predict the global minimum very well. I think I might just have to keep it set to a fixed number of evals, unless there's some other way to tell if it's converged on a solution? |
Yep, capping the number of evaluations seems likely to be the best strategy. If you're in 2d and only care about 0.1 accuracy, then 100 should let it try each option. |
Ah, that makes sense! |
You could also perhaps pass |
Edit: Erroneous results, check next comment Oh.. this parallel approach is much MUCH faster if I set
That's with:
|
@timholy I've just gone back and re-tested more systematically, and I think I was wrapping two changes into one. The results were a bit exaggerated! These should be more representative. Video 1
Video 2
|
I was surprised at the magnitude of the effect, this makes more sense! We are contemplating whether any of the parameter changes you mention in #21 (comment) deserve to be made official. The |
Indeed. The logic seems good, but I guess the default will need a tweak to not break anisotropicicity? There may be a more elegant way than this, but something like:
which would handle cases where |
Firstly, thanks again for this package. I'm getting really nice results with it.
Given I'm registering an image stack relative to its first image, I was wondering whether there's any easy opportunities for further parallelization? I say further because I've seen that
qd_translate()
does seem to exceed a single thread, which is great, but only reaches ~250% on my 6 core cpu, which restricts the opportunity for per-image for loop threading. Also, might there be any easy way to use CuArrays & GPUs?For instance, the simplest per-image thread parallelization I could imagine would be something like this. Note that if
tforms
has already been calculated for the previous frame, it's used forinitial_tfm
, otherwise the default pre-populated(0.0, 0.0)
is used (that's the idea at least):The text was updated successfully, but these errors were encountered: