-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
divsufsort64 is slower than divsufsort #21
Comments
For the long strings that might be a simple case of caching effects. Two times larger indices require twice the memory, filling up the CPU caches faster. However, that should not affect strings of length 10. |
Hi, my original observation were made in Python (using ctypes). I can confirm with this code in C: #include "divsufsort.h"
#include "divsufsort64.h"
#include <time.h>
#include <stdio.h>
int main(){
char s[6] = "banana";
int64_t suf64[6];
int32_t suf[6];
int n = 100;
clock_t beg, t1=0, t2=0;
for(int rep=0;rep<10; rep++){
beg = clock();
for(int _=0;_<n;_++)
divsufsort(s, suf, 6);
t1 += clock() - beg;
beg = clock();
for(int _=0;_<n;_++)
divsufsort64(s, suf64, 6);
t2 += clock() - beg;
}
printf("divsufsort: %d\ndivsufsort64: %d\n", t1, t2);
} An example run gives:
I'm using a very slow machine, you might want to change |
“a very slow machine” “difference in time” |
No, no, I know a bit how computers work :) I have a 64 bit ARM processor. |
test your code on my macos:
swap to sais.hxx:
|
These numbers smell fishy. My guess is that the sais function is only in the header, but after being called its result is not used. Thus those numbers represent the overhead of |
I don't understand
Why would it behave differently from divsufsort? |
The divsufsort function is in a different compilation unit, actually in a dynamic library. Thus, at the time of compilation, the compiler does not know what it does. It could print the whole string, for all it knows, so it cannot remove the call without changing the observable behavior of the program. |
I see, you mean that since sais is a header only library, the compiler is able to optimize the calls. |
For what it's worth, I just reproduced this benchmark on my M1 machine.
The compiler command is
The gap disappears for at
|
For very small strings (10), the difference is about 2x. For strings of moderate size (1e7), the difference in time is about 10%.
I have no idea why and the difference is significant.
The text was updated successfully, but these errors were encountered: