Releases: pinellolab/CRISPResso2
Nicking Hancock
This release incorporates changes to make bowtie2 alignment in CRISPRessoPooled more permissive (44dc9e7), and remove duplicate rows in the Alleles_frequency_table.txt due to reads being in the forward or reverse direction (0e08cd0).
When given a genome file, CRISPRessoPooled aligns reads to the genome using the Bowtie2 aligner. The legacy parameters were somewhat strict. The new parameters reflect the 'default_min_aln_score' parameter in allowing for substantially more indels and mismatches than previous.
The parameter --use_legacy_bowtie2_options_string
has been added to use the legacy settings. Otherwise, the bowtie2 alignment settings will be calculated as follows:
--end-to-end - no clipping, match bonus -ma is set to 0
-N 0 number of mismatches allowed in seed alignment
--np 0 where read (or ref have ambiguous character (N)) penalty is 0
--mp 3,2 mismatch penalty - set max mismatch to -3 to coincide with the gap extension penalty (2 is the default min mismatch penalty)
--score-min L,-5,-3*(1-H) For a given homology score, we allow up to (1-H) mismatches (-3) or gap extensions (-3) and one gap open (-5). This score translates to -5 + -3(1-H)L where L is the sequence length
Knockout Lake
Starting in version 2.1.0, insertion quantification has been changed to only include insertions completely contained by the quantification window.
To use the legacy quantification method (i.e. include insertions directly adjacent to the quantification window) please use the parameter --use_legacy_insertion_quantification
This release also includes several updates to:
- prime editing: pegRNA spacer sequences given in the incorrect orientation are no longer tolerated
- HDR: Ambiguous alignments don't contribute to the plot 4g (except when
--expand_ambiguous_alignments
is provided) --fastq_output
now also writes alignment scores and alignments for every read
CRISPRessoAggregate Debut
CRISPRessoAggregate can be used to aggregate multiple completed CRISPResso runs.
v2.0.44: Fix plot window cloning from Ref1 to HDR
Improvements in inferring quantification windows across amplicons/alleles.
Axis ticks and other fixes
Add ticks to appropriate plots
Update the function of histograms
- by default 99% of data is shown in plots, now 100% of data is written to data files.
- new parameter
--plot_histogram_outlier
to plot 100% of data
v2.0.42
WGS and Plotting updates
WGS and Pooled summary figures scale height based on the number of entries so that they are legible in html reports.
WGS parallelization mode bug fixed
Added --fastq-out parameter to report the CRISPResso analysis separately for each read. Note that this should be used with caution. I'm still trying to figure out what information should be reported for each read, and what format it should be in. Open to feedback on this issue!
v2.0.40: Prime editing updates
Prime editing updates - scaffold parameter is now called --prime_editing_pegRNA_scaffold_seq
.
Guide names with spaces produce file names with hyphens instead of spaces
v2.0.39
v2.0.38: Bam processing + Prime Editing updates
-
Input can now be read from bam using the parameter
--bam_input
and (optionally)--bam_chr_loc
to use the reads in the bam at this location as input.
An output bam is produced with an additional soace-separated field prefixed by c2 (e.g. c2:Z:ALN=Inferred CLASS=Inferred_MODIFIED MODS=D47;I0;S0 DEL=56(47) INS= SUB= ALN_REF=TTGGCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCCGCCACCGTGCGCCGGGCCTTGCAGTGGGCGCGCTACCTGCGCCACATCCATCGGCGCTTTGGTCGGCATGGCCCCATTCGCACGGCTCT----------------------------------------------- ALN_SEQ=ACACCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGA-----------------------------------------------TCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCCGCCACCGTGCGCCGGGCCTTGCAGTGGGCGCGCTACCTGCGCCACATCCATCGGCGCTTTGGTCGGCATGGCCCCATTCGCACGGCTCTGGAGCGGCGGCTGCACAACCAGTGGAGGCAAGAGGGCGGCTTTGGGC). Note that the alignment details (location, cigar string, etc) are not modified.. this may be done in the future). Bam file input cannot be trimmed or pre-processed with quality filtering. -
Prime editing scaffold incorporation is now more accurate (looks for the scaffold sequence at the expected position directly after the extension sequence). A plot showing the number of bases matching the scaffold, as well as insertions after the extension sequence, and a data file with these numbers is produced. Added parameter
--prime_editing_pegRNA_scaffold_min_match_length
to define the minimum length required to classify a read as 'Scaffold-incorporated' -
Renamed split_paired_end parameter to
--split_interleaved_input
for interleaved input -
Auto mode now considers 5000 reads to detect amplicon sequences
-
Add new paramter
--annotate_wildtype_allele
to annotate wildtype alleles on the allele plots -
Update output when reporting missing files -- only lists first 15 files in the current directory and directory of input parameter
--reference https instead of http