v0.7.0
Change Log
For Version 0.7.0
- Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use
glycowork
- Added optional installs for specialized
glycowork
usage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs
glycan_data
Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings
loader
- Added
multireplace
helper function, to map a dictionary of changes to a string - Made
build_custom_df
faster
motif
draw
- Added
draw
as a new submodule of.motif
- Added
GlycoDraw
to draw glycans in SNFG style and save them as .svg/.pdf - Added
annotate_figure
to replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.) - Added
text_to_glycan
, which replaces glycan strings in figures with glycan images - Added
scale_in_range
to normalize a list of numbers within a range
tokenization
- Sped up
glycan_to_composition
by 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions usingglycan_to_composition
- Sped up
composition_to_mass
(independent of the above) glycan_to_composition
(and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN- Renamed
calculate_theoretical_mass
toglycan_to_mass
- Sped up
mz_to_composition2
by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom - Reprioritized
mz_to_composition2
by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions canonicalize_iupac
now also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich- Moved
canonicalize_iupac
intomotif.processing
- Expanded
get_core
(and downstream functions) with HexA, HexNAc, dHex - Expanded
map_to_basic
to (some) post-biosynthetic modifications mz_to_structures
no longer outright fails if no m/z value can be matched- Deprecated
structures_to_motifs
;annotate_dataset
can do the same
processing
- Fixed bug in processing glycans with floating substituents in
small_motif_find
- Deprecated
seed_wildcard
choose_correct_isoform
has been updated to keep up with the improvedfind_isomorphs
- Added more informative error message to
IUPAC_to_SMILES
get_lib
is now slightly faster
graph
- Sped up
compare_glycans
with string inputs, by avoiding graph operations when the two glycans do not have the same composition - Added support for enabling modification wildcards in
compare_glycans
andsubgraph_isomorphism
(for instance matching GalOS and Gal6S) by setting wildcards_ptm = True - Speed-up
glycan_to_nxGraph_int
by optimizing node label/attribute assignments - Refactor
graph_to_string
to be a lot more robust, streamlined, and faster. Its new integration withcanonicalize_iupac
may also result in string improvement upon back-translation (e.g., branch order canonicalization) ensure_graph
now has **kwargs that get passed toglycan_to_nxGraph
get_possible_topologies
now supports internal additions as well, with the keyword argument ‘exhaustive’possible_topology_check
now supports wildcard matching via **kwargs passed on tocompare_glycans
- Made changes to make
glycowork
compatible with NetworkX 3.0 - Moved
bracket_removal
tomotif.processing
- Fixed a small inconsistency in handling floating substituents in
glycan_to_nxGraph_int
that could have caused issues with custom libs override_reducing_end
is no longer needed inglycan_to_nxGraph
to delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred withinglycan_to_nxGraph
now
annotate
- Deprecated
convert_to_counts_glycoletter
andglycoletter_count_matrix
;motif_matrix
can do both - Refactored
motif_matrix
to be substantially faster and more condensed in its output (also speeds upannotate_dataset
with the ‘exhaustive’ option in the feature_set argument) - Expanded
motif_matrix
to implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former) annotate_glycan
is now dual-compatible with string and networkx graph input- expanded feature_set in
annotate_dataset
by the option ‘terminal’, which callsget_terminal_structures
- This usage of
get_terminal_structures
inannotate_dataset
now also does the same implicit test for subsumption enrichment as described formotif_matrix
above annotate_dataset
now creates its own lib, based on the motif list and the provided glycans- Expanded
find_isomorphs
to also be able to re-shuffle (some) branched branches - Moved
find_isomorphs
intomotif.processing
- Linkages-only are no longer considered by
motif_matrix
/annotate_dataset
analysis
- All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
- Added
get_differential_expression
to compare glycomics data, including data cleaning and imputation get_pvals_motifs
andmake_heatmap
no longer have the lib keyword argument, asannotate_dataset
will generate a suitable lib internally- Fixed relative abundance summation in motif-mode for
make_heatmap
- Added the
clean_up_heatmap
helper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs - Added
make_volcano
, to generate a volcano plot from internally calculated differential expression using theget_differential_expression
function - Moved
cohen_d
intomotif.processing
ml
model_training
train_ml_model
no longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally
network
biosynthesis
- Refactored
construct_network
pipeline to be faster and more memory-efficient reducing_end
has been deprecated and is being handled internally- Added
infer_roots
to auto-inferpermitted_roots
(also does not need to be specified any longer inconstruct_network
) - Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
- Deprecated
subgraph_to_string
andmake_network_from_edges
- Deprecated
fill_with_virtuals
andmake_network_directed
- Minor speed-up of
process_ptm
, by pre-calculating stem_lib once instead of for every glycan in network