Derek W. Wright, MRC-University of Glasgow Centre for Virus Research Derek.Wright@glasgow.ac.uk
In this practical, we will be exploring the FASTA, multi-FASTA and FASTQ formats.
Commands that you need to enter into the terminal window (command line) are presented in a box with a fixed-width font, like this:
ls
A few tips to remember:
- Use the tab key to automatically complete filenames – especially long ones.
- Use the up arrow to scroll through your previous commands, it enables you to easily re-run or re-use/adapt old commands.
- Case Matters - the following file names are all different:
Myfile.txt
MyFile.txt
MYFILE.txt
myfile.txt
my file.txt
my_file.txt
Watch out for number 1 being confused with lowercase letter L, and capital letter O being confused with zero 0.
- l = lower case L
- 1 = number one
- O = capital letter O
- 0 = zero
Shorthand/wildcard symbols help to save typing:
- ~ is shorthand for your home directory
- . is shorthand for the current working directory
- .. is shorthand for the directory above
- * may be used as a wildcard to match file names
- Login to Windows PC
- Login to alpha2 server via MobaXTerm
ssh username@alpha2.cvr.gla.ac.uk
cd ~
~ is shortcut for home directory
cp -r /home3/dw73x/Formats .
. is shortcut for current working directory
cd Formats
less single_seq.fasta
- Press space or f to scroll down through the file page by page
- Press b to scroll back up a page
- Press q to quit
less protein.faa
- Sars-CoV-2 spike protein amino acid sequence from NCBI
less BabayanEtAl_sequences.fasta
- Press space or f to scroll down through the file page by page
- Press b to scroll back up
- Press q to quit
grep '>' BabayanEtAl_sequences.fasta
- Search (grep) for lines with the > symbol in the file
grep '>' BabayanEtAl_sequences.fasta | wc -l
- Search (grep) for lines with the > symbol in the file
- Pipe (|) the results in to the next command
- Word count (wc) the number of lines (-l)
less reads_R1.fastq
- Press space or f to scroll through the file page by page
- Press b to scroll back up a page
- Press q to quit
grep '@SRR1553467.279000' reads_R1.fastq
- Search (grep) for lines with string “SRR1553467.279000” (i.e. search for the read with the name SRR1553467.279000)
grep '@SRR1553467.279000' -A 3 reads_R1.fastq
- As a FASTQ read consists of 4 lines, also return the 3 lines after (-A 3)
wc –l reads_R1.fastq
- Word count (wc) the number of lines (-l) in the file (lines not reads)
- Divide by 4 to get the number of reads
grep '^@SRR1553467' reads_R1.fastq | wc -l
- Search (grep) for lines beginning (^) with the ‘SRR’ symbol in the file reads.fastq
- Pipe (|) the results on to the next command
- Word count (wc) the number of lines (-l)
- Number of reads in the file
FASTQ files are often gzipped (compressed) and have .fastq.gz extension Use commands zcat, zmore, zless, zgrep to access these compressed files
zless 00013_OS_L_NA_S1_R1_001.fastq.gz