-
-
Notifications
You must be signed in to change notification settings - Fork 224
Fix queue_jobs fetch performance: expression index for ORDER BY and drop unused GIN index #496
Description
- I have checked the existing issues to avoid duplicates
- I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue
Describe the bug
This is a follow-up to #390, which identified the slow queue_jobs fetch query but was closed because the proposed fix changed the ORDER BY semantics. This fix preserves the exact ordering while achieving the same performance improvement, plus addresses a second issue with an unused GIN index.
Analysis
1. Queue fetch scans all pending rows due to unindexable ORDER BY
The queue server's handleJobs query (server.go:181) orders by status = 'retry' DESC, priority, run_after, id. The computed boolean expression status = 'retry' prevents Postgres from using any standard B-tree index for ordered retrieval, forcing a full scan + sort of all pending/retry rows.
Before (using queue_jobs_queue_status_idx):
Limit (cost=9586.08..9586.09 rows=1 width=724) (actual time=422.197..422.199 rows=1 loops=1)
Buffers: shared hit=8968
-> LockRows (cost=9586.08..10349.32 rows=61059 width=724) (actual time=422.195..422.197 rows=1 loops=1)
-> Sort (cost=9586.08..9738.73 rows=61059 width=724) (actual time=422.131..422.132 rows=1 loops=1)
Sort Key: ((status = 'retry'::queue_job_status)) DESC, priority, run_after, id
Sort Method: quicksort Memory: 54644kB
-> Index Scan using queue_jobs_queue_status_idx on queue_jobs
(actual time=0.123..75.084 rows=60384 loops=1)
Execution Time: 424.739 ms
447ms, 8,970 buffer hits, 54MB in-memory sort of 60k rows. Multiplied by processor.concurrency, this creates significant contention.
The fix is an expression index that includes the computed boolean:
CREATE INDEX queue_jobs_fetch_order_idx
ON queue_jobs (queue, (status = 'retry') DESC, priority, run_after, id)
WHERE status IN ('pending', 'retry');After:
Limit (cost=0.42..0.59 rows=1 width=724) (actual time=0.154..0.155 rows=1 loops=1)
Buffers: shared hit=5
-> LockRows (cost=0.42..10458.37 rows=60802 width=724) (actual time=0.153..0.153 rows=1 loops=1)
-> Index Scan using queue_jobs_fetch_order_idx on queue_jobs
(actual time=0.114..0.115 rows=1 loops=1)
Index Cond: ((queue = 'process_torrent'::text) AND (run_after <= now()))
Execution Time: 0.233 ms
0.23ms, 5 buffer hits. ~2000x improvement. The ordering is preserved exactly.
2. Unused GIN index on (queue, payload)
pg_stat_user_indexes shows the GIN index created in migration 00012 has never been read:
indexname | size | scans
---------------------------------------------------+---------+-------
queue_jobs_queue_payload_idx | 678 MB | 0
queue_jobs_id_queue_status_priority_run_after_idx | 9480 kB | 12258
queue_jobs_pkey | 5896 kB | 29
queue_jobs_fingerprint_idx | 5648 kB | 29
queue_jobs_queue_status_idx | 728 kB | 29285
678MB, 0 scans. Every queue INSERT/UPDATE/DELETE maintains this index for no benefit.
Impact
After both changes, slow queries (30s+ logged by gorm) dropped to zero and queue processing throughput roughly doubled. I can submit a PR with a goose migration implementing both fixes.
Environment Information (Required)
- Bitmagnet version:
v0.10.0 - OS and version:
Debian Linux (Docker, PostgreSQL 16-alpine) - Config overrides:
processor.concurrency: 16, 10 DHT crawler nodes, ~6.5M torrents indexed, ~60k pending queue jobs