-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Anand Maurya edited this page Dec 21, 2022
·
36 revisions
Python 3.x
The user is free to install the dependencies in the base environment or setup a new virtual environment.
Download or clone the repository and use the requirements.txt
provided in the package to install all the dependencies.
pip3 install -r requirements.txt
sra-annotator [-q] [-o OUTPUT] [-m {quick,full}] [-f] [-d DICTIONARY] [-v] [-h] query
- User must specify a query term or use
-q
to input a list of SRA accessions. An input file or a search keyword is a required parameter. - Arguments shown in
[]
are optional.
Examples:
sra-annotator -q PRJNA868738
sra-annotator -q PRJNA868738 -m full
sra-annotator -q PRJNA868738 -m full -d example/keyword-dict.json
sra-annotator -q example/sra-accn-list.txt -m full -f
Short | Long | Default | Details |
---|---|---|---|
-q | --query | required | Query string to search the SRA. Please use quotes '' if the query contains multiple words. eg: PRJNA868738 SRR15736787 'PRJNA761299 OR SRR15736787' 'trna[Text Word] AND "Danio rerio"[Organism]' '(2008[Publication Date] : 2009[Publication Date]) AND "arabidopsis thaliana"[Organism]' 'petals[Text Word] AND Arabidopsis thaliana[Organism]' Please refer https://www.ncbi.nlm.nih.gov/sra/docs/srasearch/ to learn more about basic and advanced search in NCBI SRA. This option can also accept a list of SRA accessions in a plain text file. Please make sure that the file contains one accession per line. An example file containing the list of accession is provided in the example/ directory. |
-o | --output | pwd |
Output directory to store the results. |
-m | --mode | quick |
quick mode dumps the run level annotation, whereas full mode attempts to retrieve both the run level and sample level annotation. full mode also converts the annotation from JSON to CSV for each run accession. full mode might not work with all complex queries. |
-f | --fastq | optional | Locate the web address to the raw data and generate a script to download the fastq file(s). Depending on the number of fastq files to be searched, this can take some time. |
-d | --dictionary | optional | This option takes a JSON file as input and uses the designated keywords from the file to identify the samples. Example: { "tissue": [flower, petals], "reagent": "trizol" } . In the aforementioned example, the tool will search the metadata of each sample for the keywords flowers , petals , and trizol . It will then produce a report with the columns tissue and reagent and list all the accessions that match the keyords. When entering several terms into the dictionary, it is advised to utilize full mode to enhance the likelihood of matches. An example json file is provided in the example/ directory. |
-h | --help | Show the usage instructions. | |
-v | --version | Show the version. |
The tool organizes the output in 3 folders:
-
sra_data/fastq_source
contains the bash script to download the raw fastq files (paired-end or single-end). These files are generated when-f
argument is enabled. -
sra_data/annotation_json
contains the run-level annotation of each SRA run accession in.json
format. These files are generated when either-m quick
or-m full
is enabled. -
sra_data/annotation_text
contains the run-level and sample-level annotation of each SRA run accession in.csv
format. This takes longer than the-m quick
mode. These files are generated when-m full
is enabled. -
sra_data/keyword_hits.csv
is created when the option-d
is used. SRA entries that match a specific keyword are listed beneath the corresponding header in the file. The keys provided in the JSON file are used as headers in this output file.
-
Check your internet connection.
-
Try simplifying the search term.
-
Find the
failed_to_parse.txt
file in the output directory. It contains a list of run accessions that most likely have incorrect or missing annotation in SRA.