-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimum separator character #28
Comments
What you want is called a generalized suffix array in literature. Libdivsufsort is by now fairly old, unmaintained and not state of the art. Pretty sure you will find software better suited for your job. That said libdivsufsort is machine independent and will simply sort an array of unsigned 8 bit integers. So can simply use the 0 value as a seperator if it does not appear anywhere in your strings. (Haven't looked at the code in quite some years. Might be the case that you have to use 1). |
I assume you mean ascii 0 (NUL) rather than ascii 48 (0)... That was the first thing I tried, but got some strange results. I'll try next with ascii 1. Any pointers to a library for a generalized suffix array would be great too. Thanks. |
Hi Daniel. You could try gsufsort: https://github.com/felipelouza/gsufsort
Best regards.
…On Thu, Jul 25, 2024 at 11:36 AM Daniel Howe ***@***.***> wrote:
I assume you mean ascii 0 (NUL) rather than ascii 48 (0)... That was the
first thing I tried, but got some strange results. I'll try next with ascii
1. Any pointers to a library for a generalized suffix array would be great
too. Thanks.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPS7ZGSKMZJE5YYTQK6BXDZOEEO7AVCNFSM6AAAAABLNZV2YSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGQ4TAMBXGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Daniel Saad Nogueira Nunes
|
libdivsufsort does't know about ascii it sorts bytes independent of encoding that's why I wrote "value 0". If you had issues with using zero then it might be that libdivsufsort treats them as string terminator. I haven't looked into the code since 2016 so ... no idea. |
Not sure if this question makes sense and it may also be machine-dependent, but I'm wondering if there is a delimiter char I can use to separate words in the input to libdivsufsort so that this char will sort less than any of the characters in my text (the usual ascii set) and thus not otherwise affect the order of the suffix array ?
The text was updated successfully, but these errors were encountered: