Dump the glypy.Glycan to linear code #20

bobaoai · 2019-10-30T00:38:32Z

Hey Joshua,

I am afraid that the linearcode.dumps() function might have inconsistency with the linear code rule.
For example, if we have a glycan as below:

RES
1b:x-dglc-HEX-1:5
2s:n-acetyl
3b:b-dglc-HEX-1:5
4s:n-acetyl
5b:b-dman-HEX-1:5
6b:a-dman-HEX-1:5
7b:a-dman-HEX-1:5
8b:a-dman-HEX-1:5
9b:a-dman-HEX-1:5
LIN
1:1d(2+1)2n
2:1o(4+1)3d
3:3d(2+1)4n
4:3o(4+1)5d
5:5o(3+1)6d
6:5o(6+1)7d
7:7o(3+1)8d
8:7o(6+1)9d

We should get 'Ma3(Ma3(Ma6)Ma6)Mb4GNb4GN?'
But the dumps() returns 'Ma6(Ma3)Ma6(Ma3)Mb4GNb4GN?'.
I believe it only needs a slight modification. When the code traverses the glycan, it need sort the glycan.index descendingly to make the monosaccharide with the highest linkage-index go first.

The text was updated successfully, but these errors were encountered:

mobiusklein · 2019-10-30T01:56:11Z

This looks like a breakdown in how linear_code.glycan_to_linear_code is calling linear_code.priority. Right now, branches are selected in "priority" order, as defined in Table 1 of [1]. At some point a breaking change caused priority to return -1 for any "real" monosaccharide. Returning the correct priority should fix this, but I have to be a bit more careful about how I do that, because of the degenerate way in which LinearCode encodes monosaccharides.

Note that LinearCode derived structures are necessarily not canonicalized w.r.t. to the same structure as parsed from GlycoCT or WURCS, and if you intend to mix the two formats, you should explicitly call glycan.canonicalize().

[1] Banin, E., Neuberger, Y., Altshuler, Y., Halevi, A., Inbar, O., Nir, D., & Dukler, A. (2002). A Novel Linear Code Nomenclature for Complex Carbohydrates. Trends in Glycoscience and Glycotechnology, 14(77), 127–137. https://doi.org/10.4052/tigg.14.127

bobaoai · 2019-11-03T00:21:07Z

Thanks!

I totally agreed that LinearCode derived structures might not be canonicalized and we should be careful when using it.

Since my data analysis is only dealing with the glycan with common monosaccharides, the extreme case doesn't both me. The reason the LinearCode is used is that in my case if two glycans are the same, their linearcodes are the same. In this case, the str1==str2 will be faster to check the similarity among a set of glycans. Do you have a faster way to compare if two glycans have same structure? Thanks!

mobiusklein · 2019-11-03T01:51:08Z

The same uniqueness is applied to any comparison of canonicalized structures and formats. The GlycoCT serializer is 5 times faster and enforces the GlycoCT canonicalization sorting on the structure as it is converted to a string, so it should be better all around.

I'm not sure I'll have time to fix the LinearCode serialization issue this week.

bobaoai · 2019-11-03T02:03:30Z

Do you mean when we get the GlycoCT from the str(Glycan), it is already canonicalized? So do you imply the glypy guarantees that every Glycan object with the same structure will have the same str(Glycan)? I mean currently, I only deal with the glycans with clearly specified topology and linkages.

It's totally okay. There is no push to fix the LinearCode serialization. It all depends on your schedule.

mobiusklein · 2019-11-03T16:00:57Z

Yes, GlycoCTWriter traverses the glycan by traveling edges in the order specified by the publication [1] (impl). Of course, as we've discussed before, glypy doesn't write out UND sections at the moment, but will canonicalize the current configuration of an under-determined structure.

The glycoct module is well over 2k LOC, so it needs some serious refactoring before anyone else will be able to read it and retain anything about its organization.

[1] Herget, S., Ranzinger, R., Maass, K., & Lieth, C.-W. V. D. (2008). GlycoCT-a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12), 2162–2171. https://doi.org/10.1016/j.carres.2008.03.011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump the glypy.Glycan to linear code #20

Dump the glypy.Glycan to linear code #20

bobaoai commented Oct 30, 2019 •

edited

Loading

mobiusklein commented Oct 30, 2019

bobaoai commented Nov 3, 2019

mobiusklein commented Nov 3, 2019

bobaoai commented Nov 3, 2019 •

edited

Loading

mobiusklein commented Nov 3, 2019

Dump the glypy.Glycan to linear code #20

Dump the glypy.Glycan to linear code #20

Comments

bobaoai commented Oct 30, 2019 • edited Loading

mobiusklein commented Oct 30, 2019

bobaoai commented Nov 3, 2019

mobiusklein commented Nov 3, 2019

bobaoai commented Nov 3, 2019 • edited Loading

mobiusklein commented Nov 3, 2019

bobaoai commented Oct 30, 2019 •

edited

Loading

bobaoai commented Nov 3, 2019 •

edited

Loading