Update 04_mysterySampleID.md

added explanation for how samples 2 & 6 are limited in read length because blast can't parse super long reads
straightlab · Sep 13, 2023 · 3d578ea · 3d578ea
1 parent 9561ece
commit 3d578ea
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/2023_edition/04_mysterySampleID.md b/2023_edition/04_mysterySampleID.md
@@ -15,7 +15,9 @@ export me=$GROUP_SCRATCH/biochem_minicourse_2023/<your_dir>
 cd $me/data
 mkdir sampleX
 ```
-Copy the data for your sample from the straightlab foler. Change sampleX to your sameple number (ie. sample2). Repeat this for each sample number for your group
+Copy the data for your sample from the straightlab folder. Change sampleX to your sample number (ie. sample3). Repeat this for each sample number for your group. 
+
+In the directory GROUP_SCRATCH/biochem_minicourse_2023/straightlab/data/samples, there are files for both sample2.fastq/sample6.fastq and sample2_full.fastq/sample6_full.fastq. Samples 2 and 6 have very long reads (on the order of 1000s of kbs), so we had to cut the readlength to 1000bp to make it run faster. The sample2.fastq/sample6.fastq files are actually capped at 1000bp reads, and the "full" files contain all the reads. We had to cut down these samples because Blast is a local alignment tool, and is not built for parsing super long reads. In general, if you're simply trying to identify the source of a sample, you probably don't need to blast super long reads to do so, as it would be a waste of time & computational resources. 
 ```
 cp $GROUP_SCRATCH/biochem_minicourse_2023/straightlab/data/samples/sampleX.fastq $me/data/samples
 ```