Skip to content

Releases: PlantandFoodResearch/TEFingerprint

Alpha-0.3.2

31 May 02:04
3e81275
Compare
Choose a tag to compare

v0.3.2

  • alpha
  • changes
    • rename cluster classes and methods IDBCAN and SIDBCAN to DBICAN and SDBICAN respectively
    • minor changes for compatibility with python 3.4
    • added Makefile
  • bug fixes
    • simplify and fix travis CI
  • documentation
    • add MIT licence
    • improve method docs
    • add graphics to method docs #74
    • document requirements

Alpha-0.3.1

21 Sep 03:56
4ad7c8e
Compare
Choose a tag to compare

v0.3.1

  • alpha
  • new features
    • maximum count proportion # 115
      • added 'max_count_proportion' field to output (was already used for determining colour)
      • renamed '--no-colour' switch to '--no-max-count-proportion'
  • changes
  • package version now stored in single location 'version.py'

Alpha-0.3.0

15 Jul 22:15
15b49b6
Compare
Choose a tag to compare
  • new features
    • informative reads
      • option to include soft-clipped tips in place of their mate informative read
      • option to exclude full-length informative reads completely
    • output formats
      • outputs with the .gz extension are compressed with block-gzip rather than regular gzip #104
      • option to output tab-delimited plain text files suitable for tabix indexing when block-gzipped #103
    • extract-informative temporary files #106
      • temporary files are now written to the same location as the output by default
      • option to not remove these files
      • option to write them to a temp location provided by the operating system
  • changes
    • added Biopython as a dependency to support block-gzip compression
    • refactor submodules to hide application specific code
    • refactor clustering classes/methods to use new names
      • non-hierarchical method named IDBCAN (Interval Density Based Clustering of Applications with Noise)
      • conservative-hierarchical method named SIDBCAN (Splitting-IDBCAN)
      • aggressive-hierarchical method named SIDBCAN-aggressive and deprecated
  • bug fixes
    • python 3 shebang lines #98
    • add MANIFEST.in to allow for github install with pip #89
  • documentation
    • named clustering algorithms loosely based on DBSCAN
      • non-hierarchical method named IDBCAN (Interval Density Based Clustering of Applications with Noise)
      • hierarchical method named SIDBCAN (Splitting-IDBCAN)
    • many corrections to documentation of clustering algorithms #99
    • specify tabix command to index block-gzipped output

Alpha-0.2.0

09 Feb 02:01
f412792
Compare
Choose a tag to compare
  • new features
    • new 'conservative' clustering method #91
  • changes
    • new 'conservative' clustering method used by default #91
    • simplified clustering/splitting method selection with flag '--splitting-method'
    • changed library imports to include loci, fingerprint, fingerprintio and cluster
    • tidied cluster.py submodule code
  • bug fixes
    • fixed miscalculation of initial cluster support #90
    • fixed miscalculation of parent vs child support #95
    • fixed name collision between sub module cluster and function loci.cluster
  • documentation
    • updated method.rst with new clustering method
    • updated usage.rst with new clustering method
    • updated CLI help text
    • updated cluster.py docstrings with new method
    • added metadata for PyPI

Alpha-0.1.4

09 Jan 03:07
f8373bc
Compare
Choose a tag to compare

Substantial changes to code base in order to accommodate new features.
Command-line-tool names and arguments have changed - see the documentation.
Core clustering algorithms are unchanged and will identify the same clusters/features as before.

  • Renamed tools
    • remove tef wrapper as it is likely to have a name collision with something
      • tef fingerprint/compare are combined into single tool tefingerprint #63
      • tef preprocess is now tef-extract-informative
      • tef filter-gff is now tef-filter-gff (note: this tools behavior has changes substantially to handle the new gff output)
  • Refactored modules
    • core sub-module loci.py re-written to be more flexible #82
      • single data structure for for representing collections of loci
      • arbitrary string length limit removed (use of python string in place numpy strings)
      • split tool logic into separate sub-module fingerprint.py
  • New Features
    • tefingerprint
      • trim buffered clusters to extent of read tips
      • count n most common elements per sample in each bin #63 #81
      • use gff annotation for tagging known elements #80
      • join paired clusters using gff annotation #78
      • output files (gff, csv) are optionally pipe-able
      • output files (gff, csv) contain more detailed data
      • can read(anotation gff)/write files compressed with gzip or bz2 #86
      • escape special characters in gff files with percent encodings
    • tef-filter-gff
      • changed to handle new gff output #84
      • use of --any and --all contexts for combining filters
      • unix style wild cards for matching multiple fields
      • read and write gz abd bz2 compressed files #86
      • escape special characters with percent encodings
      • read gff from standard in

Alpha-0.1.3

03 Aug 01:22
Compare
Choose a tag to compare
  • Performance improvements:
    • Significantly reduced memory usage and improved speed for preprocess #66 #70
    • Reduced memory usage for fingerprint/compare io operations #68
    • Reduced memory usage for filter-gff #67
  • Fixes
    • Corrected feature-csv output on large arrays #58 #72
  • Documentation
    • Split readme into multiple documents
    • Switched to .rst for documentation
    • Added description of methods

Alpha-0.1.2

11 Jul 02:23
Compare
Choose a tag to compare
  • Faster output of CSV files (#58)
  • Tidied up CLI arguments (#55 #62)
    • Replace underscores with dash
    • Use of flags for Boolean arguments
    • Updated Readme with changes
  • Tests
    • Integration tests for fingerprint, compare and filter-gff
    • Further tests for preprocess

Alpha-0.1.1

27 Jun 21:54
Compare
Choose a tag to compare
  • Minor fixes for for cluster support calculation #53
  • Default selection of starting epsilon value now matches Campello et al 2015
  • Update terminology to reflect that used in Campello et al 2015
  • Tests

Alpha-0.1.0

13 Jun 02:02
Compare
Choose a tag to compare
  • Pre-Processing
    • Corrections for reverse-complemented reads when extracting dangler reads
    • Inclusion of soft-clipped sections from the outer end of proper-mapped pairs
    • Separated mapping reads to repeats from pre-processing script
  • Fingerprinting/Compare
    • Quality score filtering options
    • New output format options
  • Other
    • Bump version to 0.1.0
    • Change status to alpha
    • Updated documentation
    • Additional tests

Tabular Data Structures

13 Jun 01:36
Compare
Choose a tag to compare
Pre-release

Improvements:

  • Nicer data structures for use as a library #42
  • Easy inter-op with Numpy, Pandas and Plotting libraries in python #42
  • Improved buffering of comparative bins
  • Multi-process pipelines return results to parent process rather than printing directly to file
  • GFF output is sorted by chromosome, start position, stop position #32
  • Added better examples to readme #40
  • improved testing of fingerprint and compare data methods #6
  • Compare output is no longer nested and contains normalised read counts #4 and is coloured by read count proportions #43
  • filter_gff program is simplified (no longer deals with nested gff)
  • renamed module to tefingerprint and CLI to tef #3