Home

Requirements:

Python 3.x

Installation:

The user is free to install the dependencies in the base environment or setup a new virtual environment.

Download or clone the repository and use the requirements.txt provided in the package to install all the dependencies.

pip3 install -r requirements.txt

Usage:

sra-annotator [-q] [-o OUTPUT] [-m {quick,full}] [-f] [-d DICTIONARY] [-v] [-h] query

User must specify a query term or use -q to input a list of SRA accessions. An input file or a search keyword is a required parameter.
Arguments shown in [] are optional.

Examples:

sra-annotator -q PRJNA868738

sra-annotator -q PRJNA868738 -m full

sra-annotator -q PRJNA868738 -m full -d example/keyword-dict.json

sra-annotator -q example/sra-accn-list.txt -m full -f

Input options

Short	Long	Default	Details
-q	--query	required	Query string to search the SRA. Please use quotes '' if the query contains multiple words. eg: `PRJNA868738` `SRR15736787` `'PRJNA761299 OR SRR15736787'` `'trna[Text Word] AND "Danio rerio"[Organism]'` `'(2008[Publication Date] : 2009[Publication Date]) AND "arabidopsis thaliana"[Organism]'` `'petals[Text Word] AND Arabidopsis thaliana[Organism]'` Please refer https://www.ncbi.nlm.nih.gov/sra/docs/srasearch/ to learn more about basic and advanced search in NCBI SRA. This option can also accept a list of SRA accessions in a plain text file. Please make sure that the file contains one accession per line. An example file containing the list of accession is provided in the `example/` directory.
-o	--output	`pwd`	Output directory to store the results.
-m	--mode	`quick`	`quick` mode dumps the run level annotation, whereas `full` mode attempts to retrieve both the run level and sample level annotation. `full` mode also converts the annotation from JSON to CSV for each run accession. `full` mode might not work with all complex queries.
-f	--fastq	optional	Locate the web address to the raw data and generate a script to download the fastq file(s). Depending on the number of fastq files to be searched, this can take some time.
-d	--dictionary	optional	This option takes a JSON file as input and uses the designated keywords from the file to identify the samples. Example: `{ "tissue": [flower, petals], "reagent": "trizol" }`. In the aforementioned example, the tool will search the metadata of each sample for the keywords `flowers`, `petals`, and `trizol`. It will then produce a report with the columns `tissue` and `reagent` and list all the accessions that match the keyords. When entering several terms into the dictionary, it is advised to utilize `full` mode to enhance the likelihood of matches. An example json file is provided in the `example/` directory.
-h	--help		Show the usage instructions.
-v	--version		Show the version.

Output

The tool organizes the output in 3 folders:

sra_data/fastq_source contains the bash script to download the raw fastq files (paired-end or single-end). These files are generated when -f argument is enabled.
sra_data/annotation_json contains the run-level annotation of each SRA run accession in .json format. These files are generated when either -m quick or -m full is enabled.
sra_data/annotation_text contains the run-level and sample-level annotation of each SRA run accession in .csv format. This takes longer than the -m quick mode. These files are generated when -m full is enabled.
sra_data/keyword_hits.csv is created when the option -d is used. SRA entries that match a specific keyword are listed beneath the corresponding header in the file. The keys provided in the JSON file are used as headers in this output file.

Troubleshooting

Check your internet connection.
Try simplifying the search term.
Find the failed_to_parse.txt file in the output directory. It contains a list of run accessions that most likely have incorrect or missing annotation in SRA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Requirements:

Installation:

Usage:

Input options

Output

Troubleshooting

Clone this wiki locally