Releases: BuysDB/SingleCellMultiOmics
Snowflake
Maple
Bamtagmultiome
- Now verifies if a bam index is available and creates one if it's not available
- Continues without a reference when it is not indexed
- Fixed cluster jobs, the resulting read groups are now correct and all contigs including alternative contigs are all processed.
- Added script to estimate mappability for digest protocols, and this file can be supplied to bamtagmultiome to filter for map ability.
- Added write_program_tag method to write provenance information to bam, this is used by bamtagmultiome to add the arguments used for generating the file.
Additions
Created a lot of API documentation, for example for Fragment
Added FourThiouridine (4su) class and analysis script for newly synthesized RNA.
Added function sorted_bam_file to write straight to a sorted and indexed bam file
Updates
The QC fail bit of reads is set when the associated fragment is not valid.
MoleculeIterator is now much faster when reading through regions with very high coverage.
MoleculeIterator can now read from an iterable yielding single end reads
BAM Sorting is now performed in local directory with uuid4 prefix to prevent flooding /tmp
Added deprecation warning to universalBamTagger.py
demux.py does not allow paths with a star in it anymore.
Pumkin
Additions and improvements
scChiC
The requirement for a valid scChiC fragment is now less strict, the second mate is not required to map anymore.
Methylation
Methylation calls are now stored in the molecule object
All Bismark tags are written
Molecule
Added methods to extract base-calling feature matrices per molecule and others to extract features for structural predictions
Added methods to create a consensus read from the molecule and methods to train a classifier for consensus calling
IVT duplicates are now tagged in the bam file
The molecule class writes tags indicating which SNPs were used as evidence in assigning the allele
Fragment
UMI hamming distance threshold is now configurable
Multiome tagger
bamtagmultiome.py is a replacement of universalBamTagger.py, solving many issues.
The multiome tagger now runs all chromosomes in parallel
The path to the reference fasta is auto-detected from the BAM file
Added a --consensus option which writes a single consensus read per molecule
Added allele specific, and SNP aware methylation calling
Demultiplexer
For every rejected read the RR (Rejection reason) tag is set, explaining why a read was not demultiplexed
Added testing framework for the demultiplexer
The demultiplexer now trims off and stores the sequence of random primers.
Bugfixes
Sort ligation CSV rows
Fixed plate visualisation for Celseq2
Fixed race-condition when running demultiplexing on multiple cluster jobs
Misc
Uncountable (Methylomics)
New features in this version
- TAPS support in combination with other protocols, stable for NLAIII, experimental for scCHIC
- Tabulation tool for TAPS data
- Tagger for TAPS data
- Automated separation of DNA and RNA reads from a single cell
- Undigested site counting for digestion sequencing
- Feature annotation for molecules (genes/introns/exons/..)
- Sparse scanpy compatible count table generation
- HTML representation of molecules and fragments
- Tensor representation of molecules for machine learning applications
- It is now possible to use Custom QueryFlaggers, which allows for using BAM files which have custom information, for example cell barcode or UMI encoded in the read name.
- Variant masking in reference Fasta file
- binning utils module
- methods to extract cell barcode from molecule
- len() to fragment which returns the amount of associated reads
- Use R1/R2 primer lengths from fragment in span calculation
- add_readgroups_to_header function and sort_and_index function
Library statistics:
- CSV summary files containing diagnostics information
- Summary plots for methylation data
Fixes and improvements
- Load allele information more efficiently and cache informative variants
- Read and write gzipped GTF files
- Added show_read1/2 arguments to fragment to only visualise a single read
- Use hash descriptor for fragments for faster pooling to molecules
- Added requirements.txt
- Count table: do not count outside chromsome bounds when using sliding window
- Use the duplicate bam bit to decide if a fragment is duplicate instead of our own tags
- Check if reads are mapped before counting deduplicated reads
- Generated a list of used tags
- Updated mapping quality histogram
- Fixed base calling phred score calculation
- Allele_resolver is now an optional argument for Molecule
- do not use non-proper paired reads in the size histogram
- count and show amount of informative variants for the requested alleles. (allele resolver)
- demux.py now processes every input fastq file in a separate cluster job
- Added --nochunk flag which disables running a job per lane (demux.py)
- Don't raise exception when trying to visualise base aligned outside view (HTML view)
- readcount plot width has been reduced
Tests
- more tests for molecule pooling
- iterator_stability tests
- test_contig_selection test
- count table per chromosome test
Aux
First Brick, ready for PyPI
v0.0.3-alpha Pip installation issues have been fixed. Version goes to 0.0.3
First Brick
This is the first release of SingleCellMultiOmics