CFBI

CFBI (Calculating Frequencies of Base-substitutions and Indels) is the custome Perl and Shell scripts for calculating frequencies of base subsutitution and indels.

Features

Calaulate base substitution and indel frequencies of target gene for CRISPR genome editing.

BAM format file

BAM file was originally mapped form BWA-MEM.

Prerequisites

Software / Package

Perl v5.26.2
BioPerl v1.7.2
BWA Version: 0.7.17
SAMtools Version: 1.9
GCC
GNU coreutils

Usage:

To calculate base substitution and indel frequencies, BAM format file was generated firstly.

All Perl and Shell scripts were marked 'bold italic'.

1. Please export the CFBI directory and BioPerl to your $PATH.

export PATH="~/CFBI-master:$PATH";
export PATH="~/Bio:$PATH";

1. BWA index target sequences. '01_bwa_mem_index.sh'.

sh 01_bwa_mem_index.sh target.fa

1. BWA-MEM mapping with DNA-seq reads (FASTQ). For paired-end sequencing, only R1 reads were used. '02_bwa_mem_mapping.sh'.

sh 02_bwa_mem_mapping.sh target.fa sample_R1.fq sample_R1

1. Calculate base substitution with mapped reads (BAM). '03_base_substitution.sh'.

sh 03_base_substitution.sh sample_R1.bam target.fa sample_R1

Output file sample_R1.xls is an example result of base substitution.

1. Calculate indel frequencies with target gene InDel location. '04_indel_frequencies.sh'.

sh 04_indel_frequencies.sh sample_R1.bam EXM1 250 300 ascii.txt

Output file EMX1_indel_frequencies.txt is an example for the number of indel frequencies of target EMX1.

Input files

Target sequences with FASTQ format. (target.fa)
DNA-seq R1 reads with FASTQ format. (sample_R1.fq)
Name of BAM file. (sample_R1)
Target gene symbol. (EMX1)
Start location for indel frequencies calculation of target gene. (250, cutting site -25 bp for EMX1)
End location for indel frequencies calculation of target gene. (300, cutting site +25 bp for EMX1)
The sequence quality of reads by ASCII code. ascii.txt

Output files

See details in sample_R1.xls, the example output file for base substitution.

Field	Description
Gene symbol	Name of gene
Mut location	Location of mutant site
Ref base	Sequence of base
Flanking base	± 1 bp flanking sequence of base
# of mapped reads	Number of mapped reads for target gene
# of total mutant reads	Number of total mutant reads for target gene
% of total mutant reads	Ratio of total mutant reads for target gene
# of mutant A	Number of reads for mutant A
% of mutant A	Ratio of reads for mutant A
# of mutant C	Number of reads for mutant C
% of mutant C	Ratio of reads for mutant C
# of mutant G	Number of reads for mutant G
% of mutant G	Ratio of reads for mutant G
# of mutant N	Number of reads for mutant N
% of mutant N	Ratio of reads for mutant N
# of mutant T	Number of reads for mutant T
% of mutant T	Ratio of reads for mutant T

See details in EMX1_indel_frequencies.txt, the example output file for indel frequencies.

Field	Description
Gene symbol	Name of gene
Type of reads	Type of all reads or indel reads
# of reads	Number of all reads or indel reads

Citation

The related paper about CFBI is submitted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CFBI

Features

BAM format file

Prerequisites

Usage:

Input files

Output files

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

CFBI

Features

BAM format file

Prerequisites

Usage:

Input files

Output files

Citation

License