Release v0.7.0 · BojarLab/glycowork

Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use glycowork
Added optional installs for specialized glycowork usage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs

Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings

loader

Added multireplace helper function, to map a dictionary of changes to a string
Made build_custom_df faster

draw

Added draw as a new submodule of .motif
Added GlycoDraw to draw glycans in SNFG style and save them as .svg/.pdf
Added annotate_figure to replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.)
Added text_to_glycan, which replaces glycan strings in figures with glycan images
Added scale_in_range to normalize a list of numbers within a range

tokenization

Sped up glycan_to_composition by 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions using glycan_to_composition
Sped up composition_to_mass (independent of the above)
glycan_to_composition (and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN
Renamed calculate_theoretical_mass to glycan_to_mass
Sped up mz_to_composition2 by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom
Reprioritized mz_to_composition2 by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions
canonicalize_iupac now also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich
Moved canonicalize_iupac into motif.processing
Expanded get_core (and downstream functions) with HexA, HexNAc, dHex
Expanded map_to_basic to (some) post-biosynthetic modifications
mz_to_structures no longer outright fails if no m/z value can be matched
Deprecated structures_to_motifs ; annotate_dataset can do the same

processing

Fixed bug in processing glycans with floating substituents in small_motif_find
Deprecated seed_wildcard
choose_correct_isoform has been updated to keep up with the improved find_isomorphs
Added more informative error message to IUPAC_to_SMILES
get_lib is now slightly faster

graph

Sped up compare_glycans with string inputs, by avoiding graph operations when the two glycans do not have the same composition
Added support for enabling modification wildcards in compare_glycans and subgraph_isomorphism (for instance matching GalOS and Gal6S) by setting wildcards_ptm = True
Speed-up glycan_to_nxGraph_int by optimizing node label/attribute assignments
Refactor graph_to_string to be a lot more robust, streamlined, and faster. Its new integration with canonicalize_iupac may also result in string improvement upon back-translation (e.g., branch order canonicalization)
ensure_graph now has **kwargs that get passed to glycan_to_nxGraph
get_possible_topologies now supports internal additions as well, with the keyword argument ‘exhaustive’
possible_topology_check now supports wildcard matching via **kwargs passed on to compare_glycans
Made changes to make glycowork compatible with NetworkX 3.0
Moved bracket_removal to motif.processing
Fixed a small inconsistency in handling floating substituents in glycan_to_nxGraph_int that could have caused issues with custom libs
override_reducing_end is no longer needed in glycan_to_nxGraph to delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred within glycan_to_nxGraph now

annotate

Deprecated convert_to_counts_glycoletter and glycoletter_count_matrix ; motif_matrix can do both
Refactored motif_matrix to be substantially faster and more condensed in its output (also speeds up annotate_dataset with the ‘exhaustive’ option in the feature_set argument)
Expanded motif_matrix to implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former)
annotate_glycan is now dual-compatible with string and networkx graph input
expanded feature_set in annotate_dataset by the option ‘terminal’, which calls get_terminal_structures
This usage of get_terminal_structures in annotate_dataset now also does the same implicit test for subsumption enrichment as described for motif_matrix above
annotate_dataset now creates its own lib, based on the motif list and the provided glycans
Expanded find_isomorphs to also be able to re-shuffle (some) branched branches
Moved find_isomorphs into motif.processing
Linkages-only are no longer considered by motif_matrix / annotate_dataset

analysis

All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
Added get_differential_expression to compare glycomics data, including data cleaning and imputation
get_pvals_motifs and make_heatmap no longer have the lib keyword argument, as annotate_dataset will generate a suitable lib internally
Fixed relative abundance summation in motif-mode for make_heatmap
Added the clean_up_heatmap helper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs
Added make_volcano, to generate a volcano plot from internally calculated differential expression using the get_differential_expression function
Moved cohen_d into motif.processing

model_training

train_ml_model no longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally

biosynthesis

Refactored construct_network pipeline to be faster and more memory-efficient
reducing_end has been deprecated and is being handled internally
Added infer_roots to auto-infer permitted_roots (also does not need to be specified any longer in construct_network)
Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
Deprecated subgraph_to_string and make_network_from_edges
Deprecated fill_with_virtuals and make_network_directed
Minor speed-up of process_ptm, by pre-calculating stem_lib once instead of for every glycan in network

Provide feedback