Improve DFTFringe speed on zernike related computation#206
Improve DFTFringe speed on zernike related computation#206atsju merged 9 commits intogithubdoe:masterfrom
Conversation
|
I see you moved the class zernikePolar to it's own cpp and h file. Also I see you moved a lot of the zernikePolar code that was in zernike() function into the init() function. I'm not quite sure exactly where the time is saved but I'm not surprised as the init() function is called less often (49 times less often I think) than the zernike() function. I think I understand the changes correctly. Let me know what I missed. |
|
There is a very small speed improvement but it's really not large. |
correct. this is only to cleanup the architecture. I also cleaned some unused functions and other stuff.
correct. It's better because instead of doing the computation twice when
you missed nothing. But I need to test further before merging, there is a flaw somewhere. |
Yes it's broken somehow. It's not subtracting off the zernike's that are unchecked so defocus, coma. I think just those 2. defocus for sure is not getting subtracted off. I think spherical null is properly being removed. I think. And tilt is being removed. But not defocus. |
I will check tomorrow. |
|
At this time I'm not going to spend any time reviewing the changes but I will give you a little history. There were two approaches to generating the zernike polynomials. One was memory intensive (Derived by Mike Peck) and one was compute intensive derived by Dave Rowe and is what I used to begin with. That is each term is written out as an equation in software. That version starts to get cumbersome if you want more than 48 terms. So once we were able to use 64 bit cpu and memory was no longer an issue I switched a lot of the code to Mike Peck's way where the rho and theta arrays are generated but to save computation time I tried to make that happen only once and placed it in the init function. You will of course find some of that old code still around. I like to keep it there because it helps me remember various things about the terms being able to see how they got computed. Mike's code uses an iterative approach that is hard to understand but it is needed when wanting to compute the higher order terms. Years ago wave fronts were derived from the zernike terms themselves when using fringe tracing. Once the DFT method was created only a few terms were needed unless the users wants to disable specific terms other than the defaults. So we left that disable function but only for the first (48?) terms. There are some calls that tell the processes how many terms to use. |
|
To add to Dale's history: I believe there are 3 active ways to compute the zernike's.
If we only speed up one way, it should be (1) which is the one used the most. |
|
OK now. Gives same results as v7.3.4 I will check some compile flags too. |
zernikeprocess.cpp
Outdated
| if ((z == 3) && doDefocus){ | ||
| nz += defocus * zpolar.zernike(z,rho, theta); | ||
| nz -= zerns[z] * zpolar.zernike(z,rho,theta); | ||
| nz += defocus * zpolar.zernike(z); | ||
| nz -= zerns[z] * zpolar.zernike(z); |
There was a problem hiding this comment.
Unless I missed something obscure, this one seems unsafe.
zpolar.zernike can be called without zpolar.init function called.
Is it really useful to have 2 computation methods ?
I don't know DFTFringe enough to know where this choice is made or has an effect.
There was a problem hiding this comment.
Yes it can be called without the init and that is on purpose. The init is expensive and if called one already does not need to be called again. At least that was the original plan and I think worked well. The trick was to catch all times it really needed to be called. That would be any time the dimensions of the input matrix changed.
|
To use the other 2 methods do:
Filling in 1000 zernike formulas by hand would be onerous and difficult to test. So yeah, I think it's worth it to have 3 different methods of calculating the zernike shapes. |
|
This is because there is no big bottleneck. And probably most of "slow" responsivity comes from QT or openCV. Not DFTFRinge own code. Yes I'm glad you found what I had discovered years ago when I did my manual timing analysis. The different zernike methods result because the "most efficient" one (the one used the most as well) can not be used for some of the desired cases. I don't want to penalize the normal user with the needs of a special process. |
|
I hope to test this tomorrow. |
|
oops. Forgot about this PR somehow. Would have remembered if you had assigned it to me but that shouldn't be necessary as I can see above I said I would test it. Will test soon. Probably today. |
gr5
left a comment
There was a problem hiding this comment.
I realize I had tested this all before but I tested again anyway and I tested the 3 different types of zernike creation and it all works well. I also looked over the code much more carefully.
There's one thing I don't particularly like which is that you do a new and delete for every point in the wavefront so for a 600x600 pixel image that would be 360000 times (actually a bit less because it only does it for the inscribed circle so very roughly 300,000 times). This creates the zernTerms[] array on the stack and deletes it 300k times typically. Whereas dale used an "init" function. Even with all those new and delete, I'm not surprised it's a little faster than before. I do see quite a few improvements you made with fewer multiplications (more use of rho2 etc). I double checked every formula as well (you only changed a few but I checked them all to be absolutely sure).
I'm not sure this is any more easy to read or understand but it's certainly no worse than the "init" method.
|
There does not seem to be anyway to point back to previous comments. I made several comments but there was no reply. I also made comments that I thought were in this pull request or at least about code changes in this pull request. Those comments are not there now. Those comments were besides the ones about auto invert and asked why some changes where made. In particular there were code changes that added temporary variables of the wave front list instead of just using the wave front list as pointed to by a class that had the list pointer. I asked why do that. I can't find those comments now anywhere. I thought they were there yesterday. |
Hi @githubdoe Do not hesitate to do the comments again. Even in closed pull requests. It would help a lot answering your thoughs. |
That's interesting. I didn't think about it this way. |
|
I found the comments and they are in the just closed pull request "some code improvements" #112. Sorry for having to put this here in this pull request but I don't know how to trigger you to go back and look at those. I also did not thing I had approved that pull request. |
@githubdoe I think you mean #211 not #112 I think you did not publish your comments. And yes, Georges approved the pull request I didn't though I was a problem to merge it sorry. If you want to approve all pull requests, let me know |
|
to clarify. If my understanding of the problem is correct : You see the comments but we can't see them. |
|
I think I clicked on "review changes" in order to see the code changes in the first place. Then I just commented. Otherwise how did I see the code changes? |
|
Here is a screen shot if I click on closed PR's and then select #112. I don't know what to do after that. |
|
@githubdoe Again it's #211 not #112. Click THIS link not an other https://github.com/githubdoe/DFTFringe/pull/211/files
To answer
yes I think there is a review changes button at the beginning. But you can also access files with "file changes" tab. |
|
You can click on the picture to make it larger |
|
To come back to this pull request, it's been approved by George . do I merge ? I think so.
|
|
Dale I checked this over quite carefully. I think I spent 2 hours looking at it and I also built and tested it. |
|
The problem is that it has a different architecture than I developed and is not what I remembered how it works. That means I will not understand without digging into the design that I will have an inkling on what might be the future cause of bugs or roads to enhancements. It requires more study than I'm willing to give it to learn about it. I will have lost the feeling of control of the code. I suppose that is inevitable. But not something I give away lightly. It would be easier if there was a substantial improvement in processing or efficiency or code simplicity. So let me think about it. |
|
Remind me how I can get a copy of the source into a branch on my machine that I can play with using qt creator. |
For this branch it will be somewhat delicate. As I have no access right to this repo so I work in a fork. Getting it locally would mean you need to add a remote. I fear this will mess up your local git and make everything more difficult for you. I do it sometimes and find it somewhat difficult myself. Here is a direct link to download the zip though https://github.com/atsju/DFTFringe/archive/refs/heads/JST/performance.zip |
|
Here is my summary of the changes:
|




as discussed in #204
I mainly moved some things around. No big change in the zernike code to keep it safe and readable.
However measurement showed 50% gain on
unwrap_to_zernikes. Which represents some 5% total speedup. Those are callgrind numbers. I did not time it on an install to get a user feeling for the moment.