Job stuck randomly on multi queue processors, timeout does not work #33165

1212087 · 2020-06-09T16:56:16Z

1212087
Jun 9, 2020

Laravel Version: 7.6.2
PHP Version: 7.4.5-fpm-alpine
Database Driver & Version: MYSQL 8.0
Queue driver: Redis:alpine3.11

Description:

I'm running a heavy queue system that runs about 300k jobs and insert/updating about 50 million rows per day, using docker-compose and scale workers by docker-compose scale command
Sometimes, a job gets stuck and does not release even if the timeout has been exceeded. I'm running 6 different queues, the first is a crawler and the others are updaters (update crawled data into the database). When I run multiple crawlers processors, the updaters can be stuck randomly at any time, and any job.

This is an example updater:

        $start = microtime(true);
        $this->tmpTable = $this->getTmpTableName();
        $this->swapTmpTable();
        $totalRows = DB::table("$this->tmpTable")->count();
        $this->logInfo("Total rows: $totalRows");

        $this->updateProducts();
        $this->updatePriceHistoryTmp();
        $this->updateSoldHistoryTmp();
        $this->deleteUpdatedRows();
        $this->insertProducts();
        $this->insertPriceHistoryTmp();
        $this->insertSoldHistoryTmp();
        $this->truncateTmpTable();

        $end = microtime(true);
        $processTime = $end - $start;
        $this->logInfo("Merge tmp table time: $processTime");

All of the functions in this job just send the raw query to the Mysql server, such as:

    protected function deleteUpdatedRows()
    {
        $sql = "DELETE $this->tmpTable
            FROM $this->tmpTable
            INNER JOIN products
            ON $this->tmpTable.id = products.id
        ";
        return DB::statement($sql);
    }

The workers are started up by docker-compose:

worker_merge_products_tmp:
        image: sanhangre:worker_dev
        volumes: 
            - ./webroot:/home/projects
        working_dir: /home/projects
        restart: always
        networks: 
            - default
        command: "/usr/local/bin/php /home/projects/artisan queue:work --queue=mergeProductsTmpTable --tries=3 --timeout=60"
        depends_on:
            - redis_db

As you can see, the timeout is set to 60s, but take a look at this picture:

The stuck job took 7949s to complete, far beyond the timeout, so weird. This job normally just take about 10~20s to complete, I have no idea about this weird thing.

Currently, I'm running 6 workers for the crawler, and one worker per updater. If I scale the crawler for more than 6 workers, the updaters start to be stuck, RANDOMLY at ANY JOB.

Steps To Reproduce:

Because the jobs are stuck randomly, so I have no idea how to reproduce this bug.

Note: Please note that the updaters can be stuck at any time and any job. The job stuck for about 2 hours, no more, no less. While a job is sticking, the rest are running normally without any error.

Thank you for reading my long post, please help!

driesvints · 2020-06-09T18:20:21Z

driesvints
Jun 9, 2020
Maintainer

Moving to discussions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job stuck randomly on multi queue processors, timeout does not work #33165

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Job stuck randomly on multi queue processors, timeout does not work #33165

1212087 Jun 9, 2020

Description:

Steps To Reproduce:

Replies: 1 comment

driesvints Jun 9, 2020 Maintainer

1212087
Jun 9, 2020

driesvints
Jun 9, 2020
Maintainer