Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDBJ study ids (DRP123456) do not map correctly #314

Open
awalsh17 opened this issue Jul 1, 2024 · 0 comments
Open

DDBJ study ids (DRP123456) do not map correctly #314

awalsh17 opened this issue Jul 1, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@awalsh17
Copy link

awalsh17 commented Jul 1, 2024

Description of the bug

When the input list includes DRP****** ids or PRJDB***** ids, the sra_ids_to_runid.py script fails with error or returns the incorrect data (for a different ID).
It appears that the efetch query fails in cls._id_to_srx(identifier)

You can take for example the id mentioned in the script: DRP004793 will return the incorrect results - a different study is returned. It returns the information for SRP000742. Here is the query and result:

curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?id=004793&db=sra&rettype=runinfo&retmode=text"

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR016082,2009-06-12 17:55:38,2014-05-26 05:26:24,5832504,209970144,0,36,1149,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos5/sra-pub-zq-11/SRR000/016/SRR016082/SRR016082.sralite.1,SRX004537,Cryptosporidium_nref,FL-cDNA,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,Illumina Genome Analyzer,SRP000742,PRJDA33427,,33427,SRS002838,SAMN00004429,simple,5807,Cryptosporidium parvum,Cryptosporidium_nref,,,,,,,no,,,,,UT-MGS,SRA002056,,public,C39D154DF2031E73C9578A3317C42931,4166465432361F26049099FFC7742B15

I am not sure the best solution other than doing eutils esearch for example, term=DRP004793, and using the returned ids.

curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=DRP004793"

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD esearch 20060628//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult><Count>645</Count><RetMax>20</RetMax><RetStart>0</RetStart><IdList>
<Id>7938219</Id>
<Id>7938218</Id>
<Id>7507857</Id>
<Id>7507856</Id>
<Id>7507855</Id>
<Id>7507854</Id>
<Id>7507853</Id>
<Id>7507852</Id>
<Id>7507851</Id>
<Id>7507850</Id>
<Id>7507849</Id>
<Id>7507848</Id>
<Id>7507847</Id>
<Id>7507846</Id>
<Id>7507845</Id>
<Id>7507844</Id>
<Id>7507843</Id>
<Id>7507842</Id>
<Id>7507841</Id>
<Id>7507840</Id>
</IdList><TranslationSet/><TranslationStack>   <TermSet>    <Term>DRP004793[All Fields]</Term>    <Field>All Fields</Field>    <Count>645</Count>    <Explode>N</Explode>   </TermSet>   <OP>GROUP</OP>  </TranslationStack><QueryTranslation>DRP004793[All Fields]</QueryTranslation></eSearchResult>

Command used and terminal output

nextflow run main.nf --input 'input.txt'

This will fail to return results from DRP004793.

just running the sra_ids_to_runinfo.py directly with input.txt gives this error:

Traceback (most recent call last):
  File "bin/sra_ids_to_runinfo.py", line 498, in <module>
    sys.exit(main())
             ^^^^^^
  File "/bin/sra_ids_to_runinfo.py", line 494, in main
    fetch_sra_runinfo(args.file_in, args.file_out, ena_metadata_fields)
  File "bin/sra_ids_to_runinfo.py", line 470, in fetch_sra_runinfo
    ids = DatabaseResolver.expand_identifier(db_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "bin/sra_ids_to_runinfo.py", line 223, in expand_identifier
    return cls._id_to_srx(identifier)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "bin/sra_ids_to_runinfo.py", line 241, in _id_to_srx
    cls._content_check(response, identifier)
  File "bin/sra_ids_to_runinfo.py", line 232, in _content_check
    if response.status == 204:
       ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'status'

Relevant files

input.txt:

SRP102769
DRP004793

System information

This would happen on any system - have tested locally, awsbatch, docker, linux, macOS.
Version of nf-core/fetchngs: 1.12.0

@awalsh17 awalsh17 added the bug Something isn't working label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant