Skip to content

v0.7.0

Compare
Choose a tag to compare
@Bribak Bribak released this 20 May 05:05
· 398 commits to master since this release
14a52ad

Change Log

For Version 0.7.0

  • Removed support for Python 3.7; as we use the walrus operator in some of the re-worked functions, Python 3.8+ is now required to use glycowork
  • Added optional installs for specialized glycowork usage (‘all’, ‘ml’, and ‘draw’; for now), which install additional dependencies for these usages; more details in docs

glycan_data

Updated datasets, models, lib to be bigger & better; removed many sequence duplicates with differently written branch orderings

loader

  • Added multireplace helper function, to map a dictionary of changes to a string
  • Made build_custom_df faster

motif

draw

  • Added draw as a new submodule of .motif
  • Added GlycoDraw to draw glycans in SNFG style and save them as .svg/.pdf
  • Added annotate_figure to replace glycan text with glycan images in .svg figures (heatmaps, volcano plots, etc.)
  • Added text_to_glycan, which replaces glycan strings in figures with glycan images
  • Added scale_in_range to normalize a list of numbers within a range

tokenization

  • Sped up glycan_to_composition by 1000x (avoiding explicit stemification and just doing stemification of the building blocks); also speeds up all functions using glycan_to_composition
  • Sped up composition_to_mass (independent of the above)
  • glycan_to_composition (and downstream functions) now can handle more post-biosynthetic modifications: Ac, PCho, PEtN
  • Renamed calculate_theoretical_mass to glycan_to_mass
  • Sped up mz_to_composition2 by (i) filtering out duplicate compositions and (ii) selecting compositions from a chosen taxonomic kingdom
  • Reprioritized mz_to_composition2 by first searching for native compositions and only then looking for compositions + adducts and only then searching for doubly-charged compositions
  • canonicalize_iupac now also handles floating substituents and can handle many more typos / inconsistencies / IUPAC dialects (such as CFG-coded glycans), including improvements made by Kathryn Klarich
  • Moved canonicalize_iupac into motif.processing
  • Expanded get_core (and downstream functions) with HexA, HexNAc, dHex
  • Expanded map_to_basic to (some) post-biosynthetic modifications
  • mz_to_structures no longer outright fails if no m/z value can be matched
  • Deprecated structures_to_motifs ; annotate_dataset can do the same

processing

  • Fixed bug in processing glycans with floating substituents in small_motif_find
  • Deprecated seed_wildcard
  • choose_correct_isoform has been updated to keep up with the improved find_isomorphs
  • Added more informative error message to IUPAC_to_SMILES
  • get_lib is now slightly faster

graph

  • Sped up compare_glycans with string inputs, by avoiding graph operations when the two glycans do not have the same composition
  • Added support for enabling modification wildcards in compare_glycans and subgraph_isomorphism (for instance matching GalOS and Gal6S) by setting wildcards_ptm = True
  • Speed-up glycan_to_nxGraph_int by optimizing node label/attribute assignments
  • Refactor graph_to_string to be a lot more robust, streamlined, and faster. Its new integration with canonicalize_iupac may also result in string improvement upon back-translation (e.g., branch order canonicalization)
  • ensure_graph now has **kwargs that get passed to glycan_to_nxGraph
  • get_possible_topologies now supports internal additions as well, with the keyword argument ‘exhaustive’
  • possible_topology_check now supports wildcard matching via **kwargs passed on to compare_glycans
  • Made changes to make glycowork compatible with NetworkX 3.0
  • Moved bracket_removal to motif.processing
  • Fixed a small inconsistency in handling floating substituents in glycan_to_nxGraph_int that could have caused issues with custom libs
  • override_reducing_end is no longer needed in glycan_to_nxGraph to delineate linkage-ending glycans (e.g., Fuc(a1-2) ); this is auto-inferred within glycan_to_nxGraph now

annotate

  • Deprecated convert_to_counts_glycoletter and glycoletter_count_matrix ; motif_matrix can do both
  • Refactored motif_matrix to be substantially faster and more condensed in its output (also speeds up annotate_dataset with the ‘exhaustive’ option in the feature_set argument)
  • Expanded motif_matrix to implicitly test for subsumption enrichment (e.g., previously we only explicitly looked for “Gal(b1-?)GlcNAc”; now we also count “Gal(b1-4)GlcNAc” as to the former)
  • annotate_glycan is now dual-compatible with string and networkx graph input
  • expanded feature_set in annotate_dataset by the option ‘terminal’, which calls get_terminal_structures
  • This usage of get_terminal_structures in annotate_dataset now also does the same implicit test for subsumption enrichment as described for motif_matrix above
  • annotate_dataset now creates its own lib, based on the motif list and the provided glycans
  • Expanded find_isomorphs to also be able to re-shuffle (some) branched branches
  • Moved find_isomorphs into motif.processing
  • Linkages-only are no longer considered by motif_matrix / annotate_dataset

analysis

  • All functions with the feature_set keyword argument now can also use the ‘terminal’ keyword for analyzing non-reducing end motifs exclusively
  • Added get_differential_expression to compare glycomics data, including data cleaning and imputation
  • get_pvals_motifs and make_heatmap no longer have the lib keyword argument, as annotate_dataset will generate a suitable lib internally
  • Fixed relative abundance summation in motif-mode for make_heatmap
  • Added the clean_up_heatmap helper function to remove redundant (i.e., identical) rows in heatmaps, with a prioritization of named motifs and longer motifs containing redundant shorter motifs
  • Added make_volcano, to generate a volcano plot from internally calculated differential expression using the get_differential_expression function
  • Moved cohen_d into motif.processing

ml

model_training

  • train_ml_model no longer has the lib keyword argument, as annotate_dataset will generate a suitable lib internally

network

biosynthesis

  • Refactored construct_network pipeline to be faster and more memory-efficient
  • reducing_end has been deprecated and is being handled internally
  • Added infer_roots to auto-infer permitted_roots (also does not need to be specified any longer in construct_network)
  • Implemented distance limit, to prevent combinatorial explosion when outlier glycans are present
  • Deprecated subgraph_to_string and make_network_from_edges
  • Deprecated fill_with_virtuals and make_network_directed
  • Minor speed-up of process_ptm, by pre-calculating stem_lib once instead of for every glycan in network