If the server crashes or restarts while a query is executing, the corresponding query_runs record will be left in running status indefinitely since the update to succeeded/failed never completes.
We should add a periodic cleanup mechanism that:
- Identifies
query_runs records stuck in running beyond a configurable timeout (e.g. 5 minutes)
- Marks them as
failed with an appropriate error message (e.g. "Server interrupted before query completed")
- Runs on a timer (could piggyback on the existing cache refresh loop or be a standalone task)