Skip to content

[PATCH] Fix reconnect storm and Future serialization errors #204

@troykelly

Description

@troykelly

meshcore-py's auto-reconnect re-establishes the TCP socket but does not re-send the appstart command. Without appstart, the firmware never activates the companion session, so subsequent commands time out. After 5 unanswered sends, the library hits its disconnect threshold, drops the connection, reconnects 1 second later, and the cycle repeats — producing a reconnect storm of ~1 connection/second indefinitely.

Fix 1 (utils.py): asyncio.Future objects appear in CONNECTED event payloads (from meshcore-py's TCPConnection.connect() return value).
These are not JSON-serializable and cause recorder warnings. Handle them explicitly in sanitize_event_data().

Fix 2 (init.py): Subscribe to CONNECTED events with reconnected=True. On reconnect, re-send appstart and set_time to properly initialise the companion session, with a 5-second throttle to avoid duplicate re-init from rapid reconnects.

Tested against meshcore-ha v2.5.0 with meshcore-py v2.3.6. Connection holds stable indefinitely after the fix; previously cycled every ~1s.

 custom_components/meshcore/__init__.py | 48 ++++++++++++++++++++++++++
 custom_components/meshcore/utils.py    |  8 +++++
 2 files changed, 56 insertions(+)

diff --git a/custom_components/meshcore/utils.py b/custom_components/meshcore/utils.py
index abcdef0..1234567 100644
--- a/custom_components/meshcore/utils.py
+++ b/custom_components/meshcore/utils.py
@@ -2,6 +2,7 @@

 from __future__ import annotations

+import asyncio
 import hashlib
 import hmac
 import logging
@@ -127,6 +128,13 @@ def sanitize_event_data(data: Any) -> Any:
         return tuple(sanitize_event_data(v) for v in data)
     elif isinstance(data, bytes):
         return data.hex()
+    elif isinstance(data, asyncio.Future):
+        # asyncio.Future from meshcore-py CONNECTED events
+        if data.done() and not data.cancelled():
+            try:
+                return sanitize_event_data(data.result())
+            except Exception:
+                return str(data)
+        return str(data)
     elif hasattr(data, "__dict__") and not isinstance(data, type):
         # For objects with __dict__, convert to a sanitized dict
         # Skip for class objects (they have __dict__ but we don't want to process them)
diff --git a/custom_components/meshcore/__init__.py b/custom_components/meshcore/__init__.py
index abcdef0..1234567 100644
--- a/custom_components/meshcore/__init__.py
+++ b/custom_components/meshcore/__init__.py
@@ -536,6 +536,54 @@ async def async_setup_entry(
         _LOGGER.info("MESSAGES_WAITING auto-fetch subscriber registered")

+        # Subscribe to CONNECTED events to re-initialize after auto-reconnect.
+        # meshcore-py reconnects TCP but does not re-send appstart, so the
+        # firmware doesn't activate the companion session. Re-send appstart
+        # and set_time on every reconnect to stabilize the connection.
+        _reconnect_lock = asyncio.Lock()
+        _last_reconnect_ts = [0.0]
+
+        async def handle_reconnect(event):
+            if not event or not isinstance(event.payload, dict):
+                return
+            if not event.payload.get("reconnected"):
+                return
+
+            # Throttle: skip if we re-inited less than 5 seconds ago
+            now = time.time()
+            if now - _last_reconnect_ts[0] < 5.0:
+                return
+
+            async with _reconnect_lock:
+                # Double-check after acquiring lock
+                if time.time() - _last_reconnect_ts[0] < 5.0:
+                    return
+                _last_reconnect_ts[0] = time.time()
+
+                _LOGGER.info("Auto-reconnect detected, re-sending appstart + set_time")
+                try:
+                    mc = coordinator.api.mesh_core
+                    if mc:
+                        appstart_result = await mc.commands.send_appstart()
+                        if appstart_result and appstart_result.type != EventType.ERROR:
+                            _LOGGER.info("Re-init appstart succeeded after reconnect")
+                            coordinator.api._cache_self_info_event(appstart_result)
+                            coordinator.api._connected = True
+                            try:
+                                await mc.commands.set_time(int(time.time()))
+                            except Exception:
+                                pass
+                        else:
+                            _LOGGER.warning("Re-init appstart failed: %s", appstart_result)
+                except Exception as ex:
+                    _LOGGER.error("Error during reconnect re-init: %s", ex)
+
+        coordinator.api.mesh_core.subscribe(
+            EventType.CONNECTED,
+            handle_reconnect
+        )
+        _LOGGER.info("CONNECTED re-init subscriber registered")
+
     # Fetch initial data immediately
     # await coordinator._async_update_data()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions