Data preparation guide

To use covSampler to analyze your own data, you’ll need to prepare two files:

A FASTA file with viral genomic sequences.
A corresponding TSV file with metadata describing each sequence.

Format your sequence data

Prepare your nucleotide sequences in a FASTA format file named sequences.fasta.

You can see a formatted example sequence file here.

Format your metadata

Prepare your metadata in a TSV format file named metadata.tsv.

A metadata file must include the following fields:

Fields	Description	Format
strain	Sequence name	The strain values in the metadata file must match them in the fasta file
date	Collection date	YYYY-MM-DD (Ambiguous value is unacceptable)
region_exposure	Continent	Africa / Asia / Europe / North America / Oceania / South America
country_exposure	Country	Country
division_exposure	Administrative division	Division
pango_lineage*	Viral lineage under the Pango nomenclature	See the lastest Pango lineage list

* Currently covSampler workflow does not include Pango lineage assignment. You can perform the Pango lineage assignment using pangolin or nextclade.

You can see a formatted example metadata file here.

Create your project data directory

All data are in the data/ directory. The raw data and intermediate data of each project will be stored in its corresponding directory.

For a new project (here named tutorial_project):

Create your project data folder in data/.
Create rawdata/ folder in data/tutorial_project.
Move your sequence data and metadata into data/turotial_project/rawdata/ folder.

Now, the data/ directory structure should look like this:

data
├── README.md
├── example_project
│   └── rawdata
│       ├── metadata.tsv
│       └── sequences.fasta
└── tutorial_project
    └── rawdata
        ├── metadata.tsv
        └── sequences.fasta

What's next?

Run covSampler with your data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data preparation guide

Format your sequence data

Format your metadata

Create your project data directory

What's next?

FilesExpand file tree

prep_data.md

Latest commit

History

prep_data.md

File metadata and controls

Data preparation guide

Format your sequence data

Format your metadata

Create your project data directory

What's next?