Skip to content

Scripts to run several protocols to process and analyze Next-Generation Sequencing data

License

Notifications You must be signed in to change notification settings

johnssproul/ngs-protocols

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ngs-protocols

####Scripts to run several protocols to processing and analyzing of Next-Generation Sequencing data

  • FastA.split.pl: Split FASTA files in several subfiles.
  • FastQ.split.pl: Split FASTQ files in several subfiles.
  • alignment_copy_paste.py: Cut alignment from the left until a position and paste to the right.
  • annot_to_rexp.py: Annotate RepeatExplorer's contigs following a list.
  • bam_consensus.py: Get majority consensus sequences for BAM files.
  • bam_coverage_join.py: Generate a table of coverage along the contigs in several BAM files.
  • bam_var_join.py: Generate a variation table using several BAM files.
  • bg_count.py: Generate a table with nucleotide counts from BAM files.
  • blat_recursive.py: Parallelize a BLAT run in several threads.
  • blat_recursive_hard.py: Parallelize a BLAT run in several threads with hard options.
  • bowtie2_recursive.py: Map using Bowtie2 with several libraries consecutively.
  • bwa_protocol.py: Map using BWA in multiple libraries.
  • bwa_mem_protocol.py: Map using BWA-MEM in multiple libraries.
  • cd_hit_filter_size.py: Filter out sequences from small CD-HIT clusters.
  • count_acgtn.py: Count number of A, C, G, T and N in a multifasta file.
  • count_bases_fastq.py: Count number of nucleotides in one o several FASTQ(.GZ) files.
  • count_kmer.py: Count occurrences from a list of kmers using Jellyfish.
  • count_reads_bam.py: Generate a table with mapped reads counts in several BAM files.
  • coverage_graphics.py: Generate graphics using the ouput from bam_coverage_join.py and a samples file.
  • coverage_graphics_coord.py: A complex version of coverage_graphics.py.
  • coverage_seq_bed.py: Count number of mapped nucleotides per reference sequence in BED files.
  • coverage_window.py: Count number of mapped nucleotides in a sliding window of defined size.
  • cut_seq_unequal.py: Trim sequences from a FASTA file in subsequence of the defined size.
  • deconseq_run.py: Run DeconSeq automatically and with several threads.
  • dimerator.py: Convert a monomer fasta file in a dimer fasta file.
  • divnuc_bam.py: Calculate nucleotide diversity per site from BAM files.
  • divnuc_plot.py: Calculate nucleotide diversity per window from the output of divnuc_bam.py.
  • divsum_ab.py: Used with satminer quantification.
  • divsum_count.py: Count the number of nucleotides per elements in a RepeatMasker's divsum file.
  • divsum_stats.py: Generates interesting stats from repeat landscapes from a list of divsum files.
  • divsum_to_rl.py: Generates satDNA repeat lanscapes using satMiner's criteria.
  • dnapipete_createdb.py: Generate a database compatible with RepeatMasker from the dnaPipeTe
  • extract_member_reads_rexp.py: Extract reads in a specific cluster of RepeatExplorer.
  • extract_no_seq.py: Extract sequecences from a FASTA file absent in a list.
  • extract_reads_blat.py: Extract matching reads in a PSL output from BLAT.
  • extract_reads_rm.py: Extract matching reads in a OUT output from RepeatMasker.
  • extract_regions_bam.py: Extract reads from a BAM only in the indicated regions.
  • extract_seq.py: Extract sequences from a FASTA file present in a list conserving the order.
  • extract_seq_regions.py: Extract specific regions of sequences from a FASTA file present in a list conserving the order.
  • fasta_filter_by_length.py: Filter out sequences from a FASTA file with a size lower than a thereshold.
  • fasta_sequence_len.py: Generate a table with the length of each sequence in a FASTA file.
  • fastq-combine-pe.py: Extract reads paired reads by ID from two FASTQ files.
  • fastq-pe-random.py: Random selection of paired reads from two FASTQ files.
  • fastq_edit_ids.py: Edit the ID from FASTQ files to end with the format "@ID/1".
  • fastq_edit_ids_sra.py: Edit the ID from FASTQ files to end with the format "@ID/1" from SRA files.
  • fastq_paired_combine_id: Extract paired reads looking at its ids.
  • find_exclusive_kmers.py: Extract exclusive kmers of a library in comparison with other using Jellyfish.
  • gatk_protocol.py: Run GATK in a list of FASTQ files with the same reference.
  • get_no_blat.py: Extract sequences from a FASTA file absent in a PSL output of BLAT.
  • gff_creator.py: Generate a GFF file for htseq-count from a FASTA file.
  • id_rmasker.py: Edit IDs from a FASTA file with a format compatible with RepeatMasker.
  • id_rmasker_rexp.py: Edit IDs from a FASTA file of RepeatExplorer contigs compatible with RepeatMasker.
  • join_multiple_lists.py: Join the results of two or more lists.
  • join_multiple_lists_var.py: Join the results of two or more lists for bam_var_join.py.
  • join_rm_list.py: Join two files with RepatMasker nucleotide counts.
  • kimura_window.py: Calculate kimura divergence per window using the RepeatMasker's script.
  • kmer_to_fasta.py: Generate a FASTA file from a list of kmers.
  • longranger_prepare_reference.py: Prepare FASTA reference for longranger.
  • mapping_blat_gs.py: Extract matching reads with BLAT and optionally launch Newbler, RepeatMasker or SSAHA2.
  • mapping_blat_gs_hard.py: Extract matching reads with hard options of BLAT and optionally launch Newbler, RepeatMasker or SSAHA2.
  • mapping_blat_gs_saver.py: Version of mapping_blat_gs.py for big libraries.
  • mapping_blat_gs_single_end.py: Version of mapping_blat_gs.py for single-end libraries.
  • massive_phylogeny.py: Using an only FASTA file and gene list, it runs RAxML for each gene.
  • massive_phylogenies_figure.py: Generate pdf phylogenies using a list of Newick files.
  • massive_phylogeny_raxml_support.py: Support script for massive_phylogeny.py.
  • mitobim_run.py: Run MITObim with several protocols.
  • mreps_extract.py: Generate a FASTA file with tandem sequences using a MREPS output.
  • peru_protocol.py: Protocol to estimate number of external repeat_units in satellite DNA sequences.
  • raxml_protocol.py: RAxML protocol.
  • reduce_bam.py: Filter out unmapped paired reads from a BAM file.
  • remove_ns.py: Remove reads with Ns after a masking.
  • replace_patterns: Replace elements in a file.
  • repeat_landscape_decimal.py: Generates a repeat landscape table with divergence values adjusted to one decimal (0.1%) from an ALIGN file.
  • repeat_landscape_decimal_050.py: Generates a repeat landscape table with divergence values adjusted to 0.5% from an ALIGN file.
  • repeat_masker_run.py: Run RepeatMasker alignment for small FASTA files.
  • repeat_masker_run_big.py: Run RepeatMasker alignment for several big FASTA files.
  • rexp_get_cluster.py: Get FASTA file concatenating all the contigs assembled with RepeatExplorer.
  • rexp_prepare.py: Generate a FASTA file ready for RepeatExplorer from two FASTQ files.
  • rexp_prepare_deconseq: Generate a FASTa file ready for RepeatExplorer from two FASTQ files filtered with DeconSeq.
  • rexp_prepare_normaltag: Generate a FASTa file ready for RepeatExplorer from two FASTQ with normal tag (ids ended in /1 or /2).
  • rexp_select_contigs: Select most coveraged contigs in a RepeatExplorer's output.
  • rm_clas_seq.py: Classify reads aligning or not using a RepeatMasker's output.
  • rm_clas_seq_names: Classify reads coinciding with a annotation and aligning or not using a RepeatMasker's output.
  • rm_cluster_external.py: Select no homologous reads, group them per annotation of its read pair and clusterize them.
  • rm_getseq.py: Extract sequences of the matching regions in a RepeatMasker's output.
  • rm_getseq_annot.py: Extract sequences of the matching regions in a RepeatMasker's output and annotate the sequences of the FASTA.
  • rm_getseq_split.py: Extract sequences of the matching regions in a RepeatMasker's output annotate and split the sequences in differente FASTAs.
  • rm_join_out.py: Concantenate OUT files from several RepeatMasker's run.
  • rm_join_tbl.py: Join TBL files from several RepeatMaseker's run.
  • rm_homology.py: Find homologies searching with RepeatMasker sequence by sequence.
  • run_abyss.py: Run ABySS assembler with a range of kmers.
  • sat_cross_libraries.py: Generate FASTA files to assembly satellites with RepeatExplorer.
  • sat_cutter.py: Cut satellites in a FASTA alignment to align homologous regions.
  • sat_subfam2fam.py: Edit ALIGN file from RepatMasker to calc Kimura divergence by family instead of subfamily.
  • satminer_quant.py: satminer quantification protocol.
  • search_issr_1nt.py: Count the number of occurrences for each nucleotide before a SRR region to desing ISRR primers.
  • search_issr_2nt.py: Count the number of occurrences for each dinucleotide before a SRR region to desing ISRR primers.
  • sequence_ref_alt.py: Get sequences with REF and ALT variants after a SNP calling.
  • snp_calling_bchr.py: SNP calling for B chromosomes.
  • snp_calling_bchr_z10.py: SNP calling for B chromosomes. Alt<10 in ZB.
  • snp_calling_dn_ds: Perform a SNP calling to calculate the dn/dS from a BAM file.
  • split_illumina.py: Split FASTQ files from Illumina sequencing in several files.
  • sra_download.py: Download SRA files using a list of SRA's accesion numbers.
  • ssaha2_run.py: Run SSAHA2 mapping in several libraries.
  • ssaha2_run_multi.py: Run SSAHA2 mapping for several big libraries and parallized in different threads.
  • ssaha2_run_multi_pe_se.py: Run SSAHA2 mapping for several big libraries and parallized in different threads with paired and unpaired reads.
  • ssaha2_run_multi_se.py: Run SSAHA2 mapping for several big libraries and parallized in different threads using single-end libraries.
  • stampy_protocol.py: Run Stampy mapping.
  • subsampler.py: Subsample sequences from FASTA and FASTQ files.
  • taxonomy_retrieve.py: Retrieve taxonomy using a Species list.
  • trinity_extract_longest.py: Extract the longest contig for each gene in a Trinity assembly.
  • trinotate_auto.py: Run Trinotate.
  • unshuffle.py: Unshuffle a list of FASTQ files in _1.fastq and _2.fastq.

About

Scripts to run several protocols to process and analyze Next-Generation Sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Perl 0.9%