Bugfix Deadlock when getting the lock #31

mberacochea · 2024-11-14T16:11:54Z

This is to prevent a deadlock if the code can't get the lock file.

Example:
ERROR 2024/11/14 03:55:11 PM - lifetime has expired, breaking
ERROR 2024/11/14 03:55:11 PM - lockfile exists but isn't safe to break: /hps/nobackup/rdf/metagenomics/service-team/production/automated_jobs/ERP1432/ERP143252/download.lock

If this happens the process hangs there indefinitely. With the timeout the code will fail and exit with a Timeout error at least.

I think we need to review this lock files... we can probably remove them (or at least add unit tests to check the code is doing what we expect)

…-exists The system is retrying even if the file already exists.

This is to prevent a deadlock if the code can't get the lock file. Example: ERROR 2024/11/14 03:55:11 PM - lifetime has expired, breaking ERROR 2024/11/14 03:55:11 PM - lockfile exists but isn't safe to break: /hps/nobackup/rdf/metagenomics/service-team/production/automated_jobs/ERP1432/ERP143252/download.lock If this happens the process hangs there indefinitely. With the timeout the code will fail and exit with a Timeout error at least

fetchtool/abstract_fetch.py

SandyRogers

Thanks @mberacochea
Strange that this is happening on HPS as well as NFS. We could also try using portalocker instead (in theory it works on NFS, although in practice for EMG API concurrent log handler locking, we tell it to use a non-NFS directory for the lock files). Anyway, this looks good to make things fail explicitly.

mberacochea · 2024-11-18T09:26:19Z

Thanks @mberacochea Strange that this is happening on HPS as well as NFS. We could also try using portalocker instead (in theory it works on NFS, although in practice for EMG API concurrent log handler locking, we tell it to use a non-NFS directory for the lock files). Anyway, this looks good to make things fail explicitly.

Yeah, it is weird. The library that we are using is supposed to handle NFS like filesystems. We will have to investigate at some point.

mberacochea added 3 commits October 2, 2024 21:43

Merge pull request #30 from EBI-Metagenomics/bugfix/retries-when-file…

d6dd3fa

…-exists The system is retrying even if the file already exists.

Merge branch 'master' of github.com:EBI-Metagenomics/fetch_tool

31e11c2

mberacochea requested review from SandyRogers and MGS-sails November 14, 2024 16:11

mberacochea self-assigned this Nov 14, 2024

mberacochea commented Nov 14, 2024

View reviewed changes

fetchtool/abstract_fetch.py Outdated Show resolved Hide resolved

Typo fetchtool/abstract_fetch.py

49a89a6

SandyRogers approved these changes Nov 14, 2024

View reviewed changes

MGS-sails approved these changes Nov 15, 2024

View reviewed changes

mberacochea merged commit a7981fa into develop Nov 18, 2024
6 checks passed

mberacochea deleted the bugfix/deadlock-when-getting-the-lock branch November 18, 2024 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix Deadlock when getting the lock #31

Bugfix Deadlock when getting the lock #31

Uh oh!

mberacochea commented Nov 14, 2024

Uh oh!

Uh oh!

SandyRogers left a comment

Uh oh!

mberacochea commented Nov 18, 2024

Uh oh!

Uh oh!

Uh oh!

Bugfix Deadlock when getting the lock #31

Bugfix Deadlock when getting the lock #31

Uh oh!

Conversation

mberacochea commented Nov 14, 2024

Uh oh!

Uh oh!

SandyRogers left a comment

Choose a reason for hiding this comment

Uh oh!

mberacochea commented Nov 18, 2024

Uh oh!

Uh oh!

Uh oh!