Fix test stand astig removal crash#251
Conversation
|
Great. I'll look at this today. |
|
Wow. That makes a lot of sense to me at this point. I need to look some more. I think the basic problem is that there is some (sort of) multi threading going on with a race condition. Because of this line of code from surfacemanager.cpp (probably the one at line 2516) Which runs gui events including updating the gui dispaly while it's processing the test stand wavefronts. If I'm right then you could probably avoid the bug by having a wavefront processed such that a contourplot is on display before starting the test stand astig removal tool. Also this explains why we never saw the bug before - it's a race condition. Also why I saw it happening when loading the file (maybe) - when you do io you have wait situations (waiting for hard drive) and so tend to run other threads. However it's not true threading so maybe this ishn't the reason but I can certainly see that it may be a race condition where sometimes it creates the tools in time and sometimes it's a null when processing the first wavefront. |
|
Race condition is my fear too as I have been warned about multithreading multiple times by github copilot (GPT 5.0) while debugging this. However I don't know the architecture of the code good enough to do more than this "patch". If you know how it should really work it would be great to have real fix (or to confirm for sure if this is a good fix). |
githubdoe
left a comment
There was a problem hiding this comment.
We saw something like this before. Yes it is a race but I forget the exact cause. George found it. I think this is a good work around.
|
Hi @githubdoe |
|
Okay so I'm re-evaluating my earlier thoughts. I stepped through the code and the value of m_tools is also null when running in QT5. This is actually a temporary/custom contourPlot instance used for making the report (and pdf) to get all the images that go into the report of all the wavefronts. So it shouldn't need to point to the tools I think? Anyway I need to see why this won't crash in QT5 as well and what might have been different in version 7.4 of DFTF. I will look at this some more tomorrow. Here is where contourPlot is created and a 0 is passed in for the pointer. I don't see how it could ever be not null but I'll look at it more tomorrow and look at ver 7.4 as well. Line 2496 in 38e95ad |
|
So I looked at the code and stepped through both this version and 7.4.0 in the debugger. So the old code actually does pass a null! There is no change in QT6 except that evidently QT6 throws a null pointer exception and QT5 doesn't? So this is an old bug. I stepped through the code in V7.4.0 and it calls the enableTools(true) function on a null pointer and generates no errors and continues like nothing bad happened!? Then a few lines of code later it properly setups up m_tools and calls this again. So the first call to setSurface() could possibly be deleted? I don't know. I don't understand it. I looked at this code several times and I don't understand it. I don't understand which line of code creates the contourplot image. I think it's a png file? I see where the name of the image is generated and inserted into html but I don't understand where the image itself is created. What does it need from m_tools? Is that where it gets the color pallette for the contour plot? I have a vague idea of what's going on but don't quite understand it 100%. I strongly recommend we just keep this PR as is and we merge it and then we stop worrying about this particular crash. Unless Dale wants to explain it better. I can tell you that when m_tools is not null milliseconds later, it is called from loadWavefront() which calls generateSurfacefromWavefront() which calls surfaceGenFinished() which calls loadComplete() which calls sendSurface() which calls contourView::setSurface() which calls our line of code that doesn't crash anymore. Also, seemingly redundantly, loadWavefront() is called again from in the loop in computeStandAstig(). Anyway I think we should just merge this PR as is. Dale approved it. I'll approve it now. |
Fix #246
This does fix the crash (#250 (comment)).
However I do not like this fix because it doesn't explain why there was no crash in 7.4.0.
Maybe it's only masking something and it should be fixed at the root. I'm unsure if it's normal that
m_toolscan be null in this case.