Skip to content

Conversation

@agners
Copy link
Collaborator

@agners agners commented Apr 3, 2025

Use the fabric specific DNS-SD service subtype instead of the generic "_matter._tcp" service type. This avoids querying for operational nodes of other fabrics. This is closer to the recommendations in "4.3.2.6. Performance Recommendations" of the Matter spec.

Furthermore, always request multicasts (QM). Unicast responses won't get reliably delivered back to Python Zeroconf in almost all installation types. This is because multiple mDNS responders are listening on all interfaces and port 5353 ([::]:5353/0.0.0.0:5353). Python Zeroconf sends out unicast requests with sender port 5353, which will be answered with a unicast response to the source IP/port. However, if multiple sockets listen to all interfaces/ports, and a UDP packet is sent to a specific IP/port combination, this will only get delivered to one instance (which one is random). Responses to the multicast address will get delivered to all listening sockets (multicasts are a multicast locally too).

The default behavior sends a unicast (QU) and after 200ms multicasts (QM). So the current situation likely just lead to a slight delay.

While at it, remove the commissionable node browsing as it is not used currently. This too lowers the unnecessary mDNS traffic on the network.

Use the fabric specific DNS-SD service subtype instead of the generic
"_matter._tcp" service type. This avoids querying for operational nodes
of other fabrics. This is closer to the recommendations in
"4.3.2.6. Performance Recommendations" of the Matter spec.

While at it, remove the commissionable node browsing as it is not
used currently.
@agners
Copy link
Collaborator Author

agners commented Apr 3, 2025

There is a slight chance that some misbehaving devices do not announce themselfs using the subtype. In my testing with Thread border routers from Apple, Google and our OTBR Add-on as well as all the WiFi devices I have on my home things seem to work as usual. Since Python Zeroconf only returns the matching devices, and we rely on the browsing result to connect to devices in first place, verifying subtype DNS-SD works for all devices in a home is rather easy: Run the Python Matter server with this code and see if all devices come online.

The wrapper is how we call the CHIP SDK. We don't actually want to call
the SDK, so let's start mocking on the wrapper level.
@agners agners added the maintenance Code (quality) improvement or small enhancement which not a new feature label Apr 3, 2025
@Apollon77
Copy link
Contributor

@agners The other question is why you query MDNS at all without a need? ;-)

Ideally you should try to connect to the last known address of commissioned nodes and do not need to query anything at all. And when the last known address does not work then you discover this device specifically. Thats at least how I understood it to ideally work on controller side.

@marcelveldt
Copy link
Collaborator

@agners The other question is why you query MDNS at all without a need? ;-)

Ideally you should try to connect to the last known address of commissioned nodes and do not need to query anything at all. And when the last known address does not work then you discover this device specifically. Thats at least how I understood it to ideally work on controller side.

Its for fast operational node discovery.
Imagine somebody removing the actual power from a lightbulb. Then when power is re-applied, the expectation is that the bulb is rediscovered again (so working) within seconds.

By just only try to connect to a device at some interval you will not be able to give that user experience.
So its a combination of trying to connect at X interval and doing the operational node discovery on mdns.

@marcelveldt
Copy link
Collaborator

marcelveldt commented Apr 3, 2025

While at it, remove the commissionable node browsing as it is not used currently.

It is being used by Home Assistant to auto discover the presence of a matter device. It should not be removed.

nvm, HA is not using the event from Matter server but just listens to the mdns directly.

@agners
Copy link
Collaborator Author

agners commented Apr 3, 2025

Ideally you should try to connect to the last known address of commissioned nodes and do not need to query anything at all. And when the last known address does not work then you discover this device specifically. Thats at least how I understood it to ideally work on controller side.

Besides the address some of the protocol parameters are part of the TXT section (SII/SAI parameters). Of course we could cache those as well, and just opportunistically connect to the last known address, but maybe it is just easier to check mDNS to make sure to get the latest? 🤔 🤷

TBH, I think using the SDK asking it to connect to a IP directly is actually not possible. I think we have to create a device controller with the fabric settings, and then ask it to establish a session to a particular node ID. The SDK then requests mDNS (with its natiive C++ minimal mDNS implementation) to get the IP and handles all that.

You might now ask why do we use another mDNS in the Python Matter Server. The main reason is that the SDK doesn't have a mechanism to listen continuously for devices coming online. In theory we only need that part to be implemented on the Python side. I am actually looking into using Python Zeroconf to only passively listen for devices coming back online and leave all the rest to the SDK.

@marcelveldt
Copy link
Collaborator

The main reason is that the SDK doesn't have a mechanism to listen continuously for devices coming online.

There have been several conversations about operational node discovery within the SDK. Did none of that get implemented in the meanwhile ?

@3oris
Copy link

3oris commented Apr 4, 2025

@marcelveldt the openthread people seem to be working on this if I am not mistaken.

There seems to be a major overhaul of mdns/dnssd/srp stuff ongoing losely tracked here
openthread/openthread#9434 which ultimatly should also yield a standalone mdns implementation, because

However, there is interest in allowing this mDNS implementation to be used and integrated into other projects without including the full Thread stack.

See here: openthread/openthread#11191 and here: https://github.com/openthread/openthread/blob/15553e6dd22368204342844ee101a4b7fc2c0b00/examples/platforms/posix_mdns/README.md

Not sure, how much of this can actually be used other contexts though.

@marcelveldt
Copy link
Collaborator

Yes I'm aware of that but we're also talking about general Matter operational node discovery, not only Thread.

Let's hope we can get to one general mdns implementation because the current situation is just hopeless with every implementation having their own quirks

@Apollon77
Copy link
Contributor

@marcelveldt Re your answer above: Ok if that's mainly about listening and not active querying then all makes sense :-)

@agners
Copy link
Collaborator Author

agners commented Apr 9, 2025

Part of this PR is in #1121, other parts are in #1127. The rest would need a new PR, but there is no urgency for this right now.

@agners agners closed this Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Code (quality) improvement or small enhancement which not a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants