Redis memory spike from task throttling and queue buildup #4455

quevon24 · 2024-09-13T22:36:43Z

This morning, we encountered a Redis memory usage issue that was escalating rapidly. The problem arose because all the free PACER documents we had been collecting with the full sweep started to queue.

Some tasks began to throttle due to the @throttle_task decorator in the process_free_opinion_result function, leading to significantly long waiting times (~300000 seconds and increasing), causing tasks to pile up in the unacked queue.

This likely happened due to a court blockage that triggered multiple retry attempts as we tried to collect a large volume of documents.

For safety, we halted the PACER free documents full sweep command, assuming it was the cause of the issue. However, the sweep only generates the report—it does not download anything from PACER or create new database entries.

We later realized that the issue began yesterday when the command started sending a large number of Celery tasks to ingest the documents we had already collected from the PACER free documents report. Yesterday, it sent ~80,000 documents (tasks), and today, it sent ~200,000 documents (tasks).

The cause was an outdated cron job, which, for unknown reasons, started running the old command and using an outdated image. The cron job had been updated and correctly configured since August 30th, and no issues were reported until yesterday.

This cron job problem caused the start and end dates (today and today - 10 days) to not be passed, affecting the downloading of documents. While it didn't impact the generation of the document report—because, in the absence of a date range, it runs based on the last successful sweep of each court—it did affect the document download process by queuing all the documents from the PACERFreeDocumentRow table.

The issue with the long throttling times is similar to what is mentioned here: #4077 (comment).

TL;DR: throttle.maybe_wait() and @throttle_task don’t work well together.

It was likely a temporary blockage, and Ramiro has now reconfigured the cron job correctly, we are going to disable it for the moment while we fix the throttle issue. Later we can sweep these days that the cron is disabled.

Moving forward, we should consider removing @throttle_task to avoid these long wait times when retrying tasks. Instead, we could use the CycleChecker to check if we are done cycling through all the courts, and when focusing on a single court, we can increase the delay from 1 second to 4 seconds (i.e., 1/4 second in the decorator) per document.

mlissner · 2024-09-17T01:14:32Z

Thanks for the analysis and write up!

A couple things:

I forget if we automatically update all our cronjobs in k8s when we deploy new images. I think we do, but it's worth double checking the github action to make sure. That could be the cause of the misconfigured pod, I suppose.
Let's give the cyclechecker a try and see if it can fix this. Thank you!

quevon24 linked a pull request Sep 17, 2024 that will close this issue

Update pacer free document command to avoid high memory usage #4472

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis memory spike from task throttling and queue buildup #4455

Redis memory spike from task throttling and queue buildup #4455

quevon24 commented Sep 13, 2024 •

edited

Loading

mlissner commented Sep 17, 2024

Redis memory spike from task throttling and queue buildup #4455

Redis memory spike from task throttling and queue buildup #4455

Comments

quevon24 commented Sep 13, 2024 • edited Loading

mlissner commented Sep 17, 2024

quevon24 commented Sep 13, 2024 •

edited

Loading