Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinkamCMS segfaults in Linux #273

Open
carandraug opened this issue Mar 20, 2023 · 21 comments
Open

LinkamCMS segfaults in Linux #273

carandraug opened this issue Mar 20, 2023 · 21 comments

Comments

@carandraug
Copy link
Collaborator

As reported in https://forum.image.sc/t/driving-a-linkam-cms196-with-python-microscope/78773/2 LinkamCMS fails to initialize in Linux:

Using microscope we do not succeed in completing init_sdk() – it manages to load the DLL (although we hard to write the path in hard)

__class__._lib = ctypes.CDLL("libLinkamSDK.so")

We’re then able to read the license correctly, however when we get to:

   cfunc = ctypes.CFUNCTYPE(_uint32_t, _CommsHandle, _ControllerStatus)(
       __class__._on_new_value
   )

We get this output:

*** stack smashing detected ***: terminated
Aborted

I've tried in Linux, even without hardware, and can reproduce the issue so the problem is not on the SDK.

@carandraug carandraug changed the title LinkCMS segfaults in Linux LinkamCMS segfaults in Linux Mar 20, 2023
@carandraug
Copy link
Collaborator Author

Here's a minimal code example showing the issue:

$ cat reproduce-273.py 
import ctypes

class _ControllerStatusFlags(ctypes.Structure):
    """ControllerStatus.flags struct from C headers"""

    _fields_ = [
        ("controllerError", ctypes.c_uint, 1),
        ("heater1RampSetPoint", ctypes.c_uint, 1),
        ("heater1Started", ctypes.c_uint, 1),
        ("heater2RampSetPoint", ctypes.c_uint, 1),
        ("heater2Started", ctypes.c_uint, 1),
        ("vacuumRampSetPoint", ctypes.c_uint, 1),
        ("vacuumCtrlStarted", ctypes.c_uint, 1),
        ("vacuumValveClosed", ctypes.c_uint, 1),
        ("vacuumValveOpen", ctypes.c_uint, 1),
        ("humidityRampSetPoint", ctypes.c_uint, 1),
        ("humidityCtrlStarted", ctypes.c_uint, 1),
        ("lnpCoolingPumpOn", ctypes.c_uint, 1),
        ("lnpCoolingPumpAuto", ctypes.c_uint, 1),
    ]

cfunc = ctypes.CFUNCTYPE(ctypes.c_uint64, _ControllerStatusFlags)
cfunc(lambda x: 1)
$ python3 reproduce-273.py 
*** stack smashing detected ***: <unknown> terminated
Aborted

Looking at ctypes docs I see this warning:

Warning ctypes does not support passing unions or structures with bit-fields to functions by value. While this may work on 32-bit x86, it’s not guaranteed by the library to work in the general case. Unions and structures with bit-fields should always be passed to functions by pointer.

And indeed that's we are doing in:

        cfunc = ctypes.CFUNCTYPE(_uint32_t, _CommsHandle, _ControllerStatus)(
            __class__._on_new_value
        )

_ControllerStatus is a union and one of its fields is the _ControllerStatusFlags bit-field struct. This only crashes if the _ControllerStatusFlags struct has more than 12 fields.

@carandraug
Copy link
Collaborator Author

This issue is happening when setting the callback for status events:

typedef void (*EventNewValueCallback)(CommsHandle hDevice, LinkamSDK::ControllerStatus status);

We can't pass unions and structures and bit-fields by value. But that's the interface for the callback so we can't just change it pointer.

But I think we can drop the bit-fields ourselves. ControllerStatus is a union where one member is a struct with access to each bit and a 64bit int that holds all the flags:

    union ControllerStatus
    {
        struct
        {
            unsigned    controllerError                 : 1;
            // ... 63 other bit-fields
        }               flags;                                  ///< Accessor to the flags.
        uint64_t        value;                                  ///< Flags as a single value;
    };

If the problem is the bit-field, we should be able to remove the flags struct from the ControllerStatus union and simply pass a uint64_t and manually access the bits we care about ourselves.

Unfortunately, the Linkam API does this a lot so we probably need to do the same in a bunch of other places

I do not have access the hardware to do this though.

carandraug pushed a commit to carandraug/microscope that referenced this issue Mar 21, 2023
@carandraug
Copy link
Collaborator Author

Test fix in my 273-controller-status-value. Needs testing in real hardware and probably the same hack needs to be applied to all other unions in LinkamCMS. A nicer fix is likely to be possible but without hardware access I won't be making any deep changes.

@iandobbie
Copy link
Collaborator

Any real hardware tests will need either Tom or Jingyu to contribute. However I'm not sure either of them runs Linux anywhere. Do you think this is limited to Linux or will it also be an issue on Windows?

@edwardando
Copy link

I will try it as soon as I get access to the stage again and will keep you posted

@edwardando
Copy link

edwardando commented Mar 24, 2023

Hi there, this is a little complex but:

  1. On line 430...
    _ControllerStatusValue = _uint64_t
    should probably be
    _ControllerStatusValue = _uint64_t

  2. We have an error when making this correction, but it's due to recent developments and not the commit in itself (i.e., we don't have this error with the 0.6 version from pip, but we do have it with the latest code from master):

TypeError: __init__() missing 1 required positional argument: 'index'
Exception ignored in: <function _LinkamBase.__del__ at 0x7f098ebfef70>
Traceback (most recent call last):
  File "/home/lnd/LaserMeltingMicroscopeControl/python-microscope-carandraug-venv/lib/python3.8/site-packages/microscope/stages/linkam.py", line 1107, in __del__
    self._process_msg(Msg.CloseComms)
  File "/home/lnd/LaserMeltingMicroscopeControl/python-microscope-carandraug-venv/lib/python3.8/site-packages/microscope/stages/linkam.py", line 1119, in _process_msg
    if not self._lib.linkamProcessMessage(
AttributeError: 'NoneType' object has no attribute 'linkamProcessMessage'

@edwardando
Copy link

Starting again from the 0.6.0 tag, and applying your changes we definitely get further, but we're getting a segfault in open_comms():

    self._process_msg(
        Msg.OpenComms,
        byref(self._commsinfo),
        byref(self._h),
        result=self._connectionstatus,
    )

...with both versions of the .so file.
Any thoughts? Thanks for your help!

@carandraug
Copy link
Collaborator Author

Before fixing that, I'd prefer we fix the reason why you can't use the current development version (otherwise I will need to fix the segfault in 0.6.0 and then "forwardport" it to the development). From the error message, I think the error comes because you don't specify the index argument when you construct the stage.

Can you try to construct it with LinkamCMS(uid="", index=0) and see if that moves it forward?

@edwardando
Copy link

Yes, will do as soon as I can access the linkam stage again!

@edwardando
Copy link

OK the index=0 seems to improve things, we are now able to work from your branch with the fix to line 430 mentioned above.

We're hacking the paths to the .so file and the .lsk file but this is fine for testing.

Now we're able to get further, and run your script, but the status we get back from the stage seems to say "connected=False", what could that be a symptom of?

@edwardando
Copy link

guys?

@iandobbie
Copy link
Collaborator

Can we help?

@mickp
Copy link
Collaborator

mickp commented Apr 24, 2023

  • Is the stage running the correct firmware version for the library you are using?
  • Take a look at the other flags in the _connectionstatus on the stage object to see which error, if any, they are indicating.

@carandraug
Copy link
Collaborator Author

Adding to what @mickp said: we're redirect the logs to devnull, can you change it to some filepath (writable) and check if there's some hints there?

@iandobbie
Copy link
Collaborator

Are we sure this isn't a license issue? The SDK does require a license file to work.

@edwardando
Copy link

It's not a license issue, this we've checked carefully.
OK we'll look into the firmware version...

@iandobbie
Copy link
Collaborator

Is there any way to borrow a windows machine and test on that. The system is usb so a loan of a laptop might help. I suggest this as the current code works well on the Diamond system and was definitely working on the Oxford one when I left 18 months ago. These both use windows and that dll. I know this may not suit your purposes but should so that the firmware etc are genarlaly working.

@edwardando
Copy link

I have borrowed a windows machine and confirmed that I can drive the stage. The state of affairs is as follows:

  • On windows: microscope works as expected (both reading values, and moving stage), but uid needs to be passed
  • On Ubuntu 20.04: reading values mostly works (uid not required), stage move gives the following error:

stack smashing detected ***: terminated
Aborted

Is this normal?

@carandraug
Copy link
Collaborator Author

The UID being required is normal. in the case of Ubuntu we don't know whether that is normal since we never tested in Linux:

else: # assuming Linux. Not tested.

However, it is certainly not the desired outcome. I do not have access to a Linkam stage to fix that. Now do you know it should work in Windows, if you could fix it to work in Linux too that would be great.

@edwardando
Copy link

sure, but debugging a segfault like this is quite hard, I don't know where to start...

@carandraug
Copy link
Collaborator Author

I recommend creating a minimal example using ctypes only and without Python-Microscope. The script shouldn't be too long since there's a lot of stuff you won't need (such as setting all the callbacks and adding device settings) and you can mostly copy and paste from Python-Microscope. I would work from there to identify what it is that we're doing wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants