Skip to content

Fix test stand astig removal crash#251

Merged
atsju merged 1 commit intogithubdoe:masterfrom
atsju:JST/fixTestStandAstig800
Oct 20, 2025
Merged

Fix test stand astig removal crash#251
atsju merged 1 commit intogithubdoe:masterfrom
atsju:JST/fixTestStandAstig800

Conversation

@atsju
Copy link
Collaborator

@atsju atsju commented Oct 19, 2025

Fix #246

This does fix the crash (#250 (comment)).

However I do not like this fix because it doesn't explain why there was no crash in 7.4.0.
Maybe it's only masking something and it should be fixed at the root. I'm unsure if it's normal that m_tools can be null in this case.

@atsju atsju marked this pull request as ready for review October 19, 2025 07:58
@atsju atsju requested review from githubdoe and gr5 October 19, 2025 07:58
@gr5
Copy link
Collaborator

gr5 commented Oct 19, 2025

Great. I'll look at this today.

@gr5
Copy link
Collaborator

gr5 commented Oct 19, 2025

Wow. That makes a lot of sense to me at this point. I need to look some more.

I think the basic problem is that there is some (sort of) multi threading going on with a race condition. Because of this line of code from surfacemanager.cpp (probably the one at line 2516)
QApplication::processEvents();

Which runs gui events including updating the gui dispaly while it's processing the test stand wavefronts. If I'm right then you could probably avoid the bug by having a wavefront processed such that a contourplot is on display before starting the test stand astig removal tool.

Also this explains why we never saw the bug before - it's a race condition.

Also why I saw it happening when loading the file (maybe) - when you do io you have wait situations (waiting for hard drive) and so tend to run other threads. However it's not true threading so maybe this ishn't the reason but I can certainly see that it may be a race condition where sometimes it creates the tools in time and sometimes it's a null when processing the first wavefront.

@atsju
Copy link
Collaborator Author

atsju commented Oct 19, 2025

Race condition is my fear too as I have been warned about multithreading multiple times by github copilot (GPT 5.0) while debugging this.

However I don't know the architecture of the code good enough to do more than this "patch". If you know how it should really work it would be great to have real fix (or to confirm for sure if this is a good fix).

Copy link
Owner

@githubdoe githubdoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We saw something like this before. Yes it is a race but I forget the exact cause. George found it. I think this is a good work around.

@atsju
Copy link
Collaborator Author

atsju commented Oct 19, 2025

Hi @githubdoe
When you have some time, please also review #244 and #248. No pressure.

@gr5
Copy link
Collaborator

gr5 commented Oct 19, 2025

Okay so I'm re-evaluating my earlier thoughts. I stepped through the code and the value of m_tools is also null when running in QT5.

This is actually a temporary/custom contourPlot instance used for making the report (and pdf) to get all the images that go into the report of all the wavefronts. So it shouldn't need to point to the tools I think? Anyway I need to see why this won't crash in QT5 as well and what might have been different in version 7.4 of DFTF.

I will look at this some more tomorrow.

Here is where contourPlot is created and a 0 is passed in for the pointer. I don't see how it could ever be not null but I'll look at it more tomorrow and look at ver 7.4 as well.

ContourPlot *plot =new ContourPlot(0,0,true);//m_contourPlot;

@gr5
Copy link
Collaborator

gr5 commented Oct 20, 2025

So I looked at the code and stepped through both this version and 7.4.0 in the debugger.

So the old code actually does pass a null! There is no change in QT6 except that evidently QT6 throws a null pointer exception and QT5 doesn't?

So this is an old bug. I stepped through the code in V7.4.0 and it calls the enableTools(true) function on a null pointer and generates no errors and continues like nothing bad happened!?

Then a few lines of code later it properly setups up m_tools and calls this again. So the first call to setSurface() could possibly be deleted? I don't know. I don't understand it.

I looked at this code several times and I don't understand it. I don't understand which line of code creates the contourplot image. I think it's a png file? I see where the name of the image is generated and inserted into html but I don't understand where the image itself is created. What does it need from m_tools? Is that where it gets the color pallette for the contour plot?

I have a vague idea of what's going on but don't quite understand it 100%.

I strongly recommend we just keep this PR as is and we merge it and then we stop worrying about this particular crash. Unless Dale wants to explain it better. I can tell you that when m_tools is not null milliseconds later, it is called from loadWavefront() which calls generateSurfacefromWavefront() which calls surfaceGenFinished() which calls loadComplete() which calls sendSurface() which calls contourView::setSurface() which calls our line of code that doesn't crash anymore.

Also, seemingly redundantly, loadWavefront() is called again from in the loop in computeStandAstig().

Anyway I think we should just merge this PR as is. Dale approved it. I'll approve it now.

@atsju atsju merged commit 9762819 into githubdoe:master Oct 20, 2025
14 checks passed
@atsju atsju deleted the JST/fixTestStandAstig800 branch October 20, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

version 8.0.0 crashes when doing test stand astig removal

3 participants

Comments