-
Notifications
You must be signed in to change notification settings - Fork 133
Improve mDNS browsing to find operational nodes #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use the fabric specific DNS-SD service subtype instead of the generic "_matter._tcp" service type. This avoids querying for operational nodes of other fabrics. This is closer to the recommendations in "4.3.2.6. Performance Recommendations" of the Matter spec. While at it, remove the commissionable node browsing as it is not used currently.
|
There is a slight chance that some misbehaving devices do not announce themselfs using the subtype. In my testing with Thread border routers from Apple, Google and our OTBR Add-on as well as all the WiFi devices I have on my home things seem to work as usual. Since Python Zeroconf only returns the matching devices, and we rely on the browsing result to connect to devices in first place, verifying subtype DNS-SD works for all devices in a home is rather easy: Run the Python Matter server with this code and see if all devices come online. |
The wrapper is how we call the CHIP SDK. We don't actually want to call the SDK, so let's start mocking on the wrapper level.
|
@agners The other question is why you query MDNS at all without a need? ;-) Ideally you should try to connect to the last known address of commissioned nodes and do not need to query anything at all. And when the last known address does not work then you discover this device specifically. Thats at least how I understood it to ideally work on controller side. |
Its for fast operational node discovery. By just only try to connect to a device at some interval you will not be able to give that user experience. |
nvm, HA is not using the event from Matter server but just listens to the mdns directly. |
Besides the address some of the protocol parameters are part of the TXT section ( TBH, I think using the SDK asking it to connect to a IP directly is actually not possible. I think we have to create a device controller with the fabric settings, and then ask it to establish a session to a particular node ID. The SDK then requests mDNS (with its natiive C++ minimal mDNS implementation) to get the IP and handles all that. You might now ask why do we use another mDNS in the Python Matter Server. The main reason is that the SDK doesn't have a mechanism to listen continuously for devices coming online. In theory we only need that part to be implemented on the Python side. I am actually looking into using Python Zeroconf to only passively listen for devices coming back online and leave all the rest to the SDK. |
There have been several conversations about operational node discovery within the SDK. Did none of that get implemented in the meanwhile ? |
|
@marcelveldt the openthread people seem to be working on this if I am not mistaken. There seems to be a major overhaul of mdns/dnssd/srp stuff ongoing losely tracked here
See here: openthread/openthread#11191 and here: https://github.com/openthread/openthread/blob/15553e6dd22368204342844ee101a4b7fc2c0b00/examples/platforms/posix_mdns/README.md Not sure, how much of this can actually be used other contexts though. |
|
Yes I'm aware of that but we're also talking about general Matter operational node discovery, not only Thread. Let's hope we can get to one general mdns implementation because the current situation is just hopeless with every implementation having their own quirks |
|
@marcelveldt Re your answer above: Ok if that's mainly about listening and not active querying then all makes sense :-) |
Use the fabric specific DNS-SD service subtype instead of the generic "_matter._tcp" service type. This avoids querying for operational nodes of other fabrics. This is closer to the recommendations in "4.3.2.6. Performance Recommendations" of the Matter spec.
Furthermore, always request multicasts (QM). Unicast responses won't get reliably delivered back to Python Zeroconf in almost all installation types. This is because multiple mDNS responders are listening on all interfaces and port 5353 (
[::]:5353/0.0.0.0:5353). Python Zeroconf sends out unicast requests with sender port 5353, which will be answered with a unicast response to the source IP/port. However, if multiple sockets listen to all interfaces/ports, and a UDP packet is sent to a specific IP/port combination, this will only get delivered to one instance (which one is random). Responses to the multicast address will get delivered to all listening sockets (multicasts are a multicast locally too).The default behavior sends a unicast (QU) and after 200ms multicasts (QM). So the current situation likely just lead to a slight delay.
While at it, remove the commissionable node browsing as it is not used currently. This too lowers the unnecessary mDNS traffic on the network.