download of big files time out and don't resume #169

suhrig · 2023-06-11T19:31:15Z

Description of the bug

The process NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP is configured to cancel a download if it takes longer than 1200 seconds (see here). Long-running downloads are thus interrupted and the following error is raised:

curl: (28) Operation timed out after 1200000 milliseconds with 23601274614 out of 49920124494 bytes received
Warning: Problem : timeout. Will retry in 1 seconds. 5 retries left.
Throwing away 23601274614 bytes

Unfortunately, curl does not resume a download when it retries - even when the parameter--continue-at - is used. As stated in the error message, it "throws away" the downloaded data and starts over. This effectively means that the download never completes.

What's the point of the parameter --max-time? Is it to catch stalling downloads? Wouldn't it be better to detect stalling downloads as such, instead of assuming that any download that takes longer than 1200s must have stalled? Namely, --speed-limit 1 --speed-time 60 instructs curl to assume a download has stalled if the speed was less than 1 byte/sec for 60 seconds in a row?
This does not resolve the main issue that downloads are not resumed upon retries. One has to wrap the curl command in a bash for-loop in conjunction with --continue-at -. Alternatively: Why not use wget, which does not throw away downloaded data upon retry and also has stall detection (--read-timeout=60)?

The text was updated successfully, but these errors were encountered:

Midnighter · 2023-09-02T15:56:47Z

Those are all very good suggestion. Would you be up for making creating a PR that implements the most general ones?

suhrig · 2023-09-04T11:54:59Z

Yes, I can do that.

drpatelh · 2023-09-04T17:46:30Z

Great! Thank you 🙏🏽

samleenz · 2023-10-17T23:53:31Z

any update on this - or thoughts on an alternative solution in the mean time?

Downloading from the ENA ftp servers appears to be slow at our institute currently for some reason, so the curl command is failing on the runtime.

Thanks!

suhrig · 2023-10-18T07:47:35Z

I already have implemented a patch. I'm testing it locally. But the tests take longer than usual due to ENA/SRA being slow at the moment. I will submit a pull request with the changes I have so far, then you can test it in parallel and benefit from the enhancements now already.

suhrig · 2023-10-18T09:20:44Z

Here is the PR: #229. Feel free to test and give feedback. Thanks.

Given the slowness of ENA/SRA at the moment, you probably want to bump up the maximum runtime of the download process from the default of 4h to 1d:

process {
   withName:SRA_FASTQ_FTP {
      time = '1d'
   }
}

amizeranschi · 2023-11-21T20:58:09Z

Currently running into similar trouble with failed FTP downloads that don't get to finish, even after a couple of retries.

@suhrig thanks for submitting your PR, but it doesn't look like it's been merged yet due to some conflicts. Could you have another look?

suhrig · 2023-11-24T12:45:30Z

A hope to get around doing this next week. Sorry for the delay.

drpatelh · 2024-01-03T11:47:49Z

x-ref #241 replacement for PR #229

drpatelh · 2024-01-04T14:51:43Z

Should be fixed in #241. Please feel free to re-open if any problems persist.

suhrig added the bug Something isn't working label Jun 11, 2023

suhrig mentioned this issue Oct 18, 2023

use wget instead of curl #229

Closed

5 tasks

drpatelh added this to the 1.12.0 milestone Jan 3, 2024

drpatelh mentioned this issue Jan 4, 2024

Merge fix: "use wget instead of curl #229" #241

Merged

drpatelh closed this as completed Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

download of big files time out and don't resume #169

download of big files time out and don't resume #169

suhrig commented Jun 11, 2023 •

edited

Loading

Midnighter commented Sep 2, 2023

suhrig commented Sep 4, 2023

drpatelh commented Sep 4, 2023

samleenz commented Oct 17, 2023

suhrig commented Oct 18, 2023

suhrig commented Oct 18, 2023 •

edited

Loading

amizeranschi commented Nov 21, 2023

suhrig commented Nov 24, 2023

drpatelh commented Jan 3, 2024

drpatelh commented Jan 4, 2024 •

edited

Loading

download of big files time out and don't resume #169

download of big files time out and don't resume #169

Comments

suhrig commented Jun 11, 2023 • edited Loading

Description of the bug

Midnighter commented Sep 2, 2023

suhrig commented Sep 4, 2023

drpatelh commented Sep 4, 2023

samleenz commented Oct 17, 2023

suhrig commented Oct 18, 2023

suhrig commented Oct 18, 2023 • edited Loading

amizeranschi commented Nov 21, 2023

suhrig commented Nov 24, 2023

drpatelh commented Jan 3, 2024

drpatelh commented Jan 4, 2024 • edited Loading

suhrig commented Jun 11, 2023 •

edited

Loading

suhrig commented Oct 18, 2023 •

edited

Loading

drpatelh commented Jan 4, 2024 •

edited

Loading