Skip to content

Improve DFTFringe speed on zernike related computation#206

Merged
atsju merged 9 commits intogithubdoe:masterfrom
atsju:JST/performance
Aug 4, 2025
Merged

Improve DFTFringe speed on zernike related computation#206
atsju merged 9 commits intogithubdoe:masterfrom
atsju:JST/performance

Conversation

@atsju
Copy link
Collaborator

@atsju atsju commented Jul 19, 2025

as discussed in #204

I mainly moved some things around. No big change in the zernike code to keep it safe and readable.

However measurement showed 50% gain on unwrap_to_zernikes. Which represents some 5% total speedup. Those are callgrind numbers. I did not time it on an install to get a user feeling for the moment.

@atsju atsju marked this pull request as draft July 19, 2025 17:25
@gr5
Copy link
Collaborator

gr5 commented Jul 19, 2025

I see you moved the class zernikePolar to it's own cpp and h file.

Also I see you moved a lot of the zernikePolar code that was in zernike() function into the init() function. I'm not quite sure exactly where the time is saved but I'm not surprised as the init() function is called less often (49 times less often I think) than the zernike() function.

I think I understand the changes correctly. Let me know what I missed.

@atsju
Copy link
Collaborator Author

atsju commented Jul 19, 2025

There is a very small speed improvement but it's really not large.
More importantly, result is wrong. I will need to review the code.

@atsju
Copy link
Collaborator Author

atsju commented Jul 19, 2025

I see you moved the class zernikePolar to it's own cpp and h file.

correct. this is only to cleanup the architecture. I also cleaned some unused functions and other stuff.

Also I see you moved a lot of the zernikePolar code that was in zernike() function into the init() function. I'm not quite sure exactly where the time is saved but I'm not surprised as the init() function is called less often (49 times less often I think) than the zernike() function.

correct. It's better because instead of doing the computation twice when zernike() is called with very similar terms (change ony by sin/cos), the compiler can optimize computation of similar terms in init(). It's what Dale already did with some sin/cos and pow. I only pushed the idea further.

I think I understand the changes correctly. Let me know what I missed.

you missed nothing. But I need to test further before merging, there is a flaw somewhere.

@gr5
Copy link
Collaborator

gr5 commented Jul 19, 2025

there is a flaw somewhere.

Yes it's broken somehow. It's not subtracting off the zernike's that are unchecked so defocus, coma. I think just those 2. defocus for sure is not getting subtracted off. I think spherical null is properly being removed. I think. And tilt is being removed. But not defocus.

@atsju
Copy link
Collaborator Author

atsju commented Jul 19, 2025

there is a flaw somewhere.

Yes it's broken somehow. It's not subtracting off the zernike's that are unchecked so defocus, coma. I think just those 2. defocus for sure is not getting subtracted off. I think spherical null is properly being removed. I think. And tilt is being removed. But not defocus.

I will check tomorrow.

@githubdoe
Copy link
Owner

At this time I'm not going to spend any time reviewing the changes but I will give you a little history.

There were two approaches to generating the zernike polynomials. One was memory intensive (Derived by Mike Peck) and one was compute intensive derived by Dave Rowe and is what I used to begin with. That is each term is written out as an equation in software. That version starts to get cumbersome if you want more than 48 terms. So once we were able to use 64 bit cpu and memory was no longer an issue I switched a lot of the code to Mike Peck's way where the rho and theta arrays are generated but to save computation time I tried to make that happen only once and placed it in the init function. You will of course find some of that old code still around. I like to keep it there because it helps me remember various things about the terms being able to see how they got computed. Mike's code uses an iterative approach that is hard to understand but it is needed when wanting to compute the higher order terms.

Years ago wave fronts were derived from the zernike terms themselves when using fringe tracing. Once the DFT method was created only a few terms were needed unless the users wants to disable specific terms other than the defaults. So we left that disable function but only for the first (48?) terms. There are some calls that tell the processes how many terms to use.

@gr5
Copy link
Collaborator

gr5 commented Jul 19, 2025

To add to Dale's history: I believe there are 3 active ways to compute the zernike's.

  1. The way that Julien saw and edited. I assume that is still code that is still used but I don't know for sure.
  2. The way that Mike Peck does it which is used when you do wavefront smoothing and it can be much more than 49 terms.
  3. The annulus zernike method which is used if the annulus box is checked in mirror config. Also a variation on the Mike Peck method
  4. I think there is a 4th way that is code that never runs. Hopefully that's not what Julien edited.

If we only speed up one way, it should be (1) which is the one used the most.

@atsju atsju marked this pull request as ready for review July 20, 2025 05:49
@atsju
Copy link
Collaborator Author

atsju commented Jul 20, 2025

OK now. Gives same results as v7.3.4
All my modifications are based on measurements with callgrind so there are actually improvements in performance.
However from user point of view it's not very much. This is because there is no big bottleneck. And probably most of "slow" responsivity comes from QT or openCV. Not DFTFRinge own code.

I will check some compile flags too. fast-mast is probably unsafe but some parts of it are safe and could increase performance globally.

Comment on lines 638 to 640
if ((z == 3) && doDefocus){
nz += defocus * zpolar.zernike(z,rho, theta);
nz -= zerns[z] * zpolar.zernike(z,rho,theta);
nz += defocus * zpolar.zernike(z);
nz -= zerns[z] * zpolar.zernike(z);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I missed something obscure, this one seems unsafe.
zpolar.zernike can be called without zpolar.init function called.

Is it really useful to have 2 computation methods ?
I don't know DFTFringe enough to know where this choice is made or has an effect.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be called without the init and that is on purpose. The init is expensive and if called one already does not need to be called again. At least that was the original plan and I think worked well. The trick was to catch all times it really needed to be called. That would be any time the dimensions of the input matrix changed.

@gr5
Copy link
Collaborator

gr5 commented Jul 20, 2025

To use the other 2 methods do:

  1. go to mirror configuration and check the annulus box. This will use a completely different set of zernike's. The "annular" zernikes.

  2. To use the other method which is mostly identical to the one you are speeding up, do "tools" "smooth current wavefront". Note that you can set the terms with the "max order" option in the upper right corner. From "max order" subtract 2, divide by 2 and then square that number to get the number of zernike terms. This is slower than the version you have been working on which only goes up to 49 terms. Whereas zernike smoothing, it's common to go up to 200 terms or much more.

Filling in 1000 zernike formulas by hand would be onerous and difficult to test.

So yeah, I think it's worth it to have 3 different methods of calculating the zernike shapes.

@githubdoe
Copy link
Owner

This is because there is no big bottleneck. And probably most of "slow" responsivity comes from QT or openCV. Not DFTFRinge own code.

Yes I'm glad you found what I had discovered years ago when I did my manual timing analysis.

The different zernike methods result because the "most efficient" one (the one used the most as well) can not be used for some of the desired cases. I don't want to penalize the normal user with the needs of a special process.

@gr5
Copy link
Collaborator

gr5 commented Jul 20, 2025

I hope to test this tomorrow.

@gr5
Copy link
Collaborator

gr5 commented Jul 24, 2025

oops. Forgot about this PR somehow. Would have remembered if you had assigned it to me but that shouldn't be necessary as I can see above I said I would test it.

Will test soon. Probably today.

Copy link
Collaborator

@gr5 gr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I had tested this all before but I tested again anyway and I tested the 3 different types of zernike creation and it all works well. I also looked over the code much more carefully.

There's one thing I don't particularly like which is that you do a new and delete for every point in the wavefront so for a 600x600 pixel image that would be 360000 times (actually a bit less because it only does it for the inscribed circle so very roughly 300,000 times). This creates the zernTerms[] array on the stack and deletes it 300k times typically. Whereas dale used an "init" function. Even with all those new and delete, I'm not surprised it's a little faster than before. I do see quite a few improvements you made with fewer multiplications (more use of rho2 etc). I double checked every formula as well (you only changed a few but I checked them all to be absolutely sure).

I'm not sure this is any more easy to read or understand but it's certainly no worse than the "init" method.

@githubdoe
Copy link
Owner

There does not seem to be anyway to point back to previous comments. I made several comments but there was no reply. I also made comments that I thought were in this pull request or at least about code changes in this pull request. Those comments are not there now. Those comments were besides the ones about auto invert and asked why some changes where made.

In particular there were code changes that added temporary variables of the wave front list instead of just using the wave front list as pointed to by a class that had the list pointer. I asked why do that. I can't find those comments now anywhere. I thought they were there yesterday.

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

There does not seem to be anyway to point back to previous comments. I made several comments but there was no reply. I also made comments that I thought were in this pull request or at least about code changes in this pull request. Those comments are not there now. Those comments were besides the ones about auto invert and asked why some changes where made.

In particular there were code changes that added temporary variables of the wave front list instead of just using the wave front list as pointed to by a class that had the list pointer. I asked why do that. I can't find those comments now anywhere. I thought they were there yesterday.

Hi @githubdoe
If you do comments in review pane in the code, you can "start a review" or "add single comment". If you used "start a review" and did not click on "review changes" after that then the comment have not been published. I'm sorry for that. It would explain why we did not answer.

Do not hesitate to do the comments again. Even in closed pull requests. It would help a lot answering your thoughs.
image

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

There's one thing I don't particularly like which is that you do a new and delete for every point in the wavefront so for a 600x600 pixel image that would be 360000 times (actually a bit less because it only does it for the inscribed circle so very roughly 300,000 times). This creates the zernTerms[] array on the stack and deletes it 300k times typically. Whereas dale used an "init" function.

That's interesting. I didn't think about it this way.
It's still more efficient because the computation is better grouped. Probably better registers and cache optimization in this version than original one.
However one could think about how to avoid unnecessary copies and compute everything in place. Just thinking out loud. I propose to keep as is.

@githubdoe
Copy link
Owner

I found the comments and they are in the just closed pull request "some code improvements" #112. Sorry for having to put this here in this pull request but I don't know how to trigger you to go back and look at those. I also did not thing I had approved that pull request.

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

I found the comments and they are in the just closed pull request "some code improvements" #112. Sorry for having to put this here in this pull request but I don't know how to trigger you to go back and look at those. I also did not thing I had approved that pull request.

@githubdoe I think you mean #211 not #112

I think you did not publish your comments.
Kindly go here https://github.com/githubdoe/DFTFringe/pull/211/files and click on "review changes" top right. This WILL probably publish your comments so that we see them.

And yes, Georges approved the pull request I didn't though I was a problem to merge it sorry. If you want to approve all pull requests, let me know

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

to clarify. If my understanding of the problem is correct : You see the comments but we can't see them.
They are local to your machine as long as you do not click on "review changes"

@githubdoe
Copy link
Owner

I think I clicked on "review changes" in order to see the code changes in the first place. Then I just commented. Otherwise how did I see the code changes?

@githubdoe
Copy link
Owner

Here is a screen shot if I click on closed PR's and then select #112. I don't know what to do after that.

githubchanges

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

@githubdoe Again it's #211 not #112.
The comments you see are marked as PENDING they are local to your machine.

Click THIS link not an other https://github.com/githubdoe/DFTFringe/pull/211/files
And hit that top right button


image

To answer


I think I clicked on "review changes" in order to see the code changes in the first place. Then I just commented. Otherwise how did I see the code changes?

you clicked on "file changes" not "review changes" (both circled in my screen capture). that's why.

yes I think there is a review changes button at the beginning. But you can also access files with "file changes" tab.
Anyway : you need to re-click "review changes" and "submit review" at the end of review to share the comments

@atsju
Copy link
Collaborator Author

atsju commented Jul 24, 2025

You can click on the picture to make it larger

@atsju
Copy link
Collaborator Author

atsju commented Jul 25, 2025

To come back to this pull request, it's been approved by George . do I merge ? I think so.

  • it's slightly more performant
  • formulas and results have been checked to be 100% identical
  • no more old style singleton that is showing leaks in memory checking tools (already addressed/discussed singleton in [bug] fix some memory leaks #76 on other classes)

@gr5
Copy link
Collaborator

gr5 commented Jul 25, 2025

Dale I checked this over quite carefully. I think I spent 2 hours looking at it and I also built and tested it.

@githubdoe
Copy link
Owner

githubdoe commented Jul 25, 2025

The problem is that it has a different architecture than I developed and is not what I remembered how it works. That means I will not understand without digging into the design that I will have an inkling on what might be the future cause of bugs or roads to enhancements.

It requires more study than I'm willing to give it to learn about it. I will have lost the feeling of control of the code. I suppose that is inevitable. But not something I give away lightly. It would be easier if there was a substantial improvement in processing or efficiency or code simplicity.

So let me think about it.

@githubdoe
Copy link
Owner

Remind me how I can get a copy of the source into a branch on my machine that I can play with using qt creator.

@atsju
Copy link
Collaborator Author

atsju commented Jul 25, 2025

Remind me how I can get a copy of the source into a branch on my machine that I can play with using qt creator.

For this branch it will be somewhat delicate. As I have no access right to this repo so I work in a fork. Getting it locally would mean you need to add a remote. I fear this will mess up your local git and make everything more difficult for you. I do it sometimes and find it somewhat difficult myself.

Here is a direct link to download the zip though https://github.com/atsju/DFTFringe/archive/refs/heads/JST/performance.zip

@gr5
Copy link
Collaborator

gr5 commented Jul 25, 2025

The way I do it is I use the github desktop application. I donwloaded and installed that. Then you can note there is a "code" button in the upper right corner of this pr - you click that and it can tell you a few ways to get the code but if you do the github app one it's this command: gh pr checkout 206 as you can see circled below.

image

@gr5
Copy link
Collaborator

gr5 commented Jul 25, 2025

Here is my summary of the changes:

  1. Julien moved your existing zernikePolar class into it's own .h and .cpp files. Pretty simple change there.
  2. Then he basically moved your code that was in member function "init()" into the constructor. Which he calls. A lot (like hundreds of thousands of times for the hundreds of thousands of "pixels" in the wavefront. He calls the constructor a lot. It's fine really.
  3. He utilized rho2 and similar better in about 3 of the 49 equations.
  4. He got rid of your pow() functions and used multiply instead (faster)
  5. Lots more simple changes that are harder to summarize.

@atsju atsju merged commit 008d750 into githubdoe:master Aug 4, 2025
8 checks passed
@atsju atsju deleted the JST/performance branch August 4, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants