Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in making organism database using makeOrgPackageFromNCBI() function #47

Open
Anisha-gupta04 opened this issue Mar 20, 2023 · 6 comments

Comments

@Anisha-gupta04
Copy link

Hello I am trying to make a database for Serratia marcescens but following error is coming
makeOrgPackageFromNCBI(version = "0.1",

  •                    author = "riddhi <riddhi23.sharma@gmail.com>",
    
  •                    maintainer = "riddhi <riddhi23.sharma@gmail.com>>",
    
  •                    outputDir = "C:/Users/riddh/OneDrive/documents",
    
  •                    tax_id = "615",
    
  •                    genus = "Serratia",
    
  •                    species = "marcescens")
    

If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads
preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
Error in .tryDL(url, tmp) : url access failed after
4
attempts; url:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
In addition: Warning messages:
1: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
downloaded length 1918836056 != reported length 2652931370
2: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached
3: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
downloaded length 2431699380 != reported length 2652931370
4: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached
5: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
downloaded length 1885582304 != reported length 2652931370
6: In download.file(url, tmp, quiet = TRUE, mode = "wb") :
URL 'ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz': Timeout of 1000 seconds was reached

@lshep
Copy link
Contributor

lshep commented Mar 20, 2023

It looks like the downloads started but received a timeout. R has a default download timeout, you can see it with getOption('timeout') ; it can be adjusted with options(timeout=10000). I would suggest starting with increasing the timeout limit to something greater so the files can complete.

Cheers,

@lshep lshep closed this as completed Mar 20, 2023
@Anisha-gupta04
Copy link
Author

Hello Ishep
Thanks, I increased the timeout but now it is showing another error-Error: no such table: gene2accession_date

@lshep
Copy link
Contributor

lshep commented Mar 21, 2023

can you show the complete output?

@Anisha-gupta04
Copy link
Author

can you show the complete output?

If files are not cached locally this may take awhile to assemble a 33 GB cache databse in the NCBIFilesDir directory. Subsequent calls to this function should be faster (seconds). The cache will try to rebuild once per day.Please also see AnnotationHub for some pre-builtOrgDb downloads preparing data from NCBI ... starting download for [1] gene2pubmed.gz [2] gene2accession.gz [3] gene2refseq.gz [4] gene_info.gz [5] gene2go.gz getting data for gene2pubmed.gz extracting data for our organism from : gene2pubmed getting data for gene2accession.gz Error: no such table: gene2accession_date

 

| >

@lshep lshep reopened this Mar 21, 2023
@jmacdon
Copy link
Collaborator

jmacdon commented Mar 21, 2023

@Anisha-gupta04 You probably have a misformed NCBI.sqlite database in your working directory. You should delete that first and try again.

@hoosier060
Copy link

Hi, I was going to post this error but seeing this post, just adding here.
The same error occurs not because of timeout settings, but due to the firewall setting of our organization.
The NCBI has both ftp and http address available, and our/most organizations block ftp://ftp address.
Could this be changed to give users option to choose or fix this address to https://ftp.ncbi.nlm.nih.gov/gene/DATA/, please?

preparing data from NCBI ...
starting download for
[1] gene2pubmed.gz
[2] gene2accession.gz
[3] gene2refseq.gz
[4] gene_info.gz
[5] gene2go.gz
getting data for gene2pubmed.gz
extracting data for our organism from : gene2pubmed
getting data for gene2accession.gz
rebuilding the cache
Error in .tryDL(url, tmp) : url access failed after
4
attempts; url:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants