Skip to content

applicativesystem/pacbio-nanopore-repeat-coverage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pacbio-nanopore-repeat-coverage

  • a long read repeat coverage calculator,given an long read file before assembly either direct from the sequencing runs or after the cleaning.
  • it calculates the total amount of the repeat stretches present in the sequencing reads and you can plot them before assembly to see how much repetitive your sequencing is and how to set the long read assembly parameters.
  • I implemented a totally different approach and it parses millions of long reads in few minutes for the estimation of the coverage.
  • It reports where the stretch is located and what is the position and how much is the total coverage of that type in your each sequencing read.
repeatlocatorLongRead("/Users/gauravsablok/Desktop/CodeCheck/test_sample_sample1.fasta", 
                                                             polyATGCstretch_type="A")
	ids	sequences	repeat_locator	fraction_length	fraction_length_coverage
0	>seq1	ATATATATATAAGACGAGATAATAAAATTTTATATATATATAAAAA...	[AAAA, AAAAA]	[(41, 46), (23, 27)]	5.777778
1	>seq2	ATATATATATAAGACGAGATAATAAAATTTTATATATATATAAAAA...	[AAAA, AAAAA]	[(41, 46), (23, 27)]	5.777778

Gaurav Sablok
University of Potsdam
Potsdam,Germany