Skip to content

Commit

Permalink
Update 04_mysterySampleID.md
Browse files Browse the repository at this point in the history
added explanation for how samples 2 & 6 are limited in read length because blast can't parse super long reads
  • Loading branch information
rrbrown98 committed Sep 13, 2023
1 parent 9561ece commit 3d578ea
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion 2023_edition/04_mysterySampleID.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ export me=$GROUP_SCRATCH/biochem_minicourse_2023/<your_dir>
cd $me/data
mkdir sampleX
```
Copy the data for your sample from the straightlab foler. Change sampleX to your sameple number (ie. sample2). Repeat this for each sample number for your group
Copy the data for your sample from the straightlab folder. Change sampleX to your sample number (ie. sample3). Repeat this for each sample number for your group.

In the directory GROUP_SCRATCH/biochem_minicourse_2023/straightlab/data/samples, there are files for both sample2.fastq/sample6.fastq and sample2_full.fastq/sample6_full.fastq. Samples 2 and 6 have very long reads (on the order of 1000s of kbs), so we had to cut the readlength to 1000bp to make it run faster. The sample2.fastq/sample6.fastq files are actually capped at 1000bp reads, and the "full" files contain all the reads. We had to cut down these samples because Blast is a local alignment tool, and is not built for parsing super long reads. In general, if you're simply trying to identify the source of a sample, you probably don't need to blast super long reads to do so, as it would be a waste of time & computational resources.
```
cp $GROUP_SCRATCH/biochem_minicourse_2023/straightlab/data/samples/sampleX.fastq $me/data/samples
```
Expand Down

0 comments on commit 3d578ea

Please sign in to comment.