Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum separator character #28

Open
dhowe opened this issue Jul 25, 2024 · 4 comments
Open

Minimum separator character #28

dhowe opened this issue Jul 25, 2024 · 4 comments

Comments

@dhowe
Copy link

dhowe commented Jul 25, 2024

Not sure if this question makes sense and it may also be machine-dependent, but I'm wondering if there is a delimiter char I can use to separate words in the input to libdivsufsort so that this char will sort less than any of the characters in my text (the usual ascii set) and thus not otherwise affect the order of the suffix array ?

@akamiru
Copy link

akamiru commented Jul 25, 2024

What you want is called a generalized suffix array in literature. Libdivsufsort is by now fairly old, unmaintained and not state of the art. Pretty sure you will find software better suited for your job.

That said libdivsufsort is machine independent and will simply sort an array of unsigned 8 bit integers. So can simply use the 0 value as a seperator if it does not appear anywhere in your strings. (Haven't looked at the code in quite some years. Might be the case that you have to use 1).

@dhowe
Copy link
Author

dhowe commented Jul 25, 2024

I assume you mean ascii 0 (NUL) rather than ascii 48 (0)... That was the first thing I tried, but got some strange results. I'll try next with ascii 1. Any pointers to a library for a generalized suffix array would be great too. Thanks.

@danielsaad
Copy link

danielsaad commented Jul 25, 2024 via email

@akamiru
Copy link

akamiru commented Jul 25, 2024

libdivsufsort does't know about ascii it sorts bytes independent of encoding that's why I wrote "value 0".

If you had issues with using zero then it might be that libdivsufsort treats them as string terminator. I haven't looked into the code since 2016 so ... no idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants