-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How did you calculate the NextState? #1
Comments
The
The issue I was encountering was that to make larger matrices, I needed to generalize the derivation somehow that kept things as powers-of-two so that I could use bit-shifts again rather than a regular multiplication. Another issue is that 32-bit and 64-bit values get exhausted very quickly since Fibonacci numbers increase so fast, so it provided little benefit over just using a LUT. I got some more insight here on this reddit thread that might be of value to you too though. |
I found one nice matrix to calculate the next n+2 till n+5 Only thing is You need one more simd vector. But it should be a bit faster. The matrix:
left shiftable:
Rust implementation #[target_feature(enable = "avx2")]
#[cfg(any(target_arch = "x86_64", target_feature = "avx2"))]
pub unsafe fn fib_parallel_qfib_stride_6_avx2(n: u32) -> u64 {
use std::arch::x86_64::*;
let mut n = n;
// for the inbetween cases where n%6 > 3 : i.e. 4 and 5
// shift the fibstate for these cases
let mut fibstate = if n % 6 > 3 {
n -= 2;
_mm_set_epi32(5, 3, 2, 1)
} else {
_mm_set_epi32(2, 1, 1, 0)
// _mm_set_epi32(3, 2, 1, 1)
};
let nextstates: [__m128i; 4] = [
_mm_set_epi32(2, 0, !0, !0),
_mm_set_epi32(4, 0, 0, 1), // !0 == -1
_mm_set_epi32(4, 2, 2, 2),
_mm_set_epi32(0, 3, 2, 0),
];
for _ in (0..(n / 6 * 6)).step_by(6) {
let mut result = _mm_setzero_si128();
// fibstate = _mm_alignr_epi8(fibstate, fibstate, 4);
for nextstate in nextstates {
let product = _mm_sllv_epi32(_mm_broadcastd_epi32(fibstate), nextstate);
fibstate = _mm_alignr_epi8(_mm_setzero_si128(), fibstate, 4);
result = _mm_add_epi32(result, product);
}
fibstate = result;
}
u32x4::from(fibstate).to_array()[(n % 6) as usize]
.try_into()
.unwrap()
} edit: |
I found a solution to find bigger matrices with all power-of-two's. It's not really a math solution but more of an brute force approach. But what I found does work and I got up to n+10 already. I will leave this notebook here if you want to take a look: The basis of this is that when you try to move all terms to the left side, where n-1 is at, a matrix will always have the following structure:
For even values, the formula yields Using this trick, one can make matrices where every item is a power of 2, than sift to the left and see if it is a fibonacci matrix or not. Edit: matrixes -> matrices |
Thanks for the valuable insight! Def might revisit this to see if there's some viability in making higher-precision fibonacci calculations with SIMD. Keeping things as a pow-2 bit-shifts means something like 128/256/512-bit fib-matrices can possibly be much easier to calculate within big SIMD registers such as AVX512 or SVE and possibly beat any LUT-based implementation! |
Hey @Wunkolo I found one more thing that can really speed up the fib calculations! That thing is negative fibonacci numbers. All you have to do to the normal fibonacci algorithm is: check if n is negative and dividable by 2. If so, make these numbers negative. def fib(n):
a, b, c = 1, 0, 0;
for _ in range(0,abs(n)):
c = a + b
a = b
b = c
if n < 0 and n % 2 == 0:
c = -c
return c How will this speed up calculations? Take for example the sifted n+6 matrix.
Without negative fibonacci numbers, you would have to know the previous fibonacci numbers upto where the fibonacci seq
New solution can go from where-ever.
So, a new calculation comes for fast fibonacci calculations can be formulated. Which is also exponential.
|
Awesome work!
This looks somewhat similar to the derivation to Chun-Min Chang's fast-doubling method that I have evaluating in the current benchmarks here: Lines 156 to 165 in ebfafce
Particularly this bit looked kinda similar if (n % 2) { // n is odd: F(n) = F(((n-1) / 2) + 1)^2 + F((n-1) / 2)^2
unsigned int k = (n - 1) / 2;
return fib(k) * fib(k) + fib(k + 1) * fib(k + 1);
} else { // n is even: F(n) = F(n/2) * [ 2 * F(n/2 + 1) - F(n/2) ]
unsigned int k = n / 2;
return fib(k) * [ 2 * fib(k + 1) - fib(k) ];
} and identified some key relations here: if (n % 2) {
k = (n - 1) / 2;
fib_helper(k, f);
uint64_t a = f[0]; // F(k) = F((n-1)/2)
uint64_t b = f[1]; // F(k + 1) = F((n- )/2 + 1)
uint64_t c = a * (2 * b - a); // F(n-1) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
uint64_t d = a * a + b * b; // F(n) = F(2k + 1) = F(k)^2 + F(k+1)^2
f[0] = d; // F(n)
f[1] = c + d; // F(n+1) = F(n-1) + F(n)
} else {
k = n / 2;
fib_helper(k, f);
uint64_t a = f[0]; // F(k) = F(n/2)
uint64_t b = f[1]; // F(k + 1) = F(n/2 + 1)
f[0] = a * (2 * b - a); // F(n) = F(2k) = F(k) * [2 * F(k + 1) - F(k)]
f[1] = a * a + b * b; // F(n + 1) = F(2k + 1) = F(k)^2 + F(k+1)^2
} Though rather than doubling to get
and has similar versions for 64-bit elements. Def check out the blog post! |
Hi, great work. I was looking through this and did not see an explanation of how you went from the matrix version which uses
[4, 1, 2, 1, 4, 4, 1, 1, 1, 2, 0, 0, 0, 0, 0, 0]
to[[0, 1, 0, 0], [2, 2, 0, 0], [2, 0, 1, 0]]
in the SIMD version. I would like to write this for 256 and 512 bit-SIMD. So I would like to know how I could make the matrix larger.The text was updated successfully, but these errors were encountered: