Skip to content

Conversation

@andreselcientifico
Copy link

Removing unnecessary variables increases speed 10x from 1000 ns to 100 ns.

Removing unnecessary variables increases speed 10x from 1000 ns to 100 ns.
@opkna
Copy link

opkna commented Aug 21, 2024

Have you benchmarked this with optimizations turned on? Doubtful it will make a difference in the end.

@andreselcientifico
Copy link
Author

Have you benchmarked this with optimizations turned on? Doubtful it will make a difference in the end.

Yes, I did a performance test, that's because the code in assembly is smaller, it also reduces the times in which it saves in memory and calls the memory again to do the other calculations, that makes it 10 times faster

@LegendaryGuard
Copy link

LegendaryGuard commented Aug 28, 2024

I did the tests in the online tools. I compared original vs new.
Original:
Compiler explorer (x86-64 gcc 14.2):

OnlineGDB:

New:
Compiler explorer (x86-64 gcc 14.2):

OnlineGDB:

I don't see much difference using -std=c89 -O3 from both. Both have the same results.
But without -O3, the new one seems to have 0.002 seconds less than the original.
In compiler explorer, the new one has 34 assembly lines than the original which are 9 lines more (43 lines).
Compilers have optimization flags though, some projects don't have these flags set. I dunno what it could change that.

@andreselcientifico
Copy link
Author

I did the tests in the online tools. I compared original vs new.
Original:
Compiler explorer (x86-64 gcc 14.2):

OnlineGDB:

New:
Compiler explorer (x86-64 gcc 14.2):

OnlineGDB:

I don't see much difference using -std=c89 -O3 from both.
But without -O3, the new one seems to have 2 seconds less than the original.
Compilers have optimization flags though, some projects don't have these flags set. I dunno what it could change that.

In theory you can save the saving of 'y' in memory outside the game and place the return directly in the last function, that saves a step, and yes it is true some compilers already include optimization.

@LegendaryGuard
Copy link

LegendaryGuard commented Aug 28, 2024

I see, but it doesn't change anything with -std=c89 -O3 on compile explorer and OnlineGDB with that:

float Q_rsqrt( float number )
{
    long i;
    float x2;

    x2 = number * 0.5F;
    i  = 0x5f3759df - ( * ( long * ) &number >> 1 );
    
    return * ( float * ) &i * ( 1.5F - ( x2 * (* ( float * ) &i) * (* ( float * ) &i) ) );
}

Even with the very reduced version:

float Q_rsqrt( float number )
{
    float x2 = number * 0.5f;
    long i  = 0x5f3759df - ( *( long* )&number >> 1 );
    number  = *( float* )&i;
    return ( number * ( 1.5f - ( x2 * number * number ) ) );
}

Nothing changes. The result is the same.

@cultleader-f
Copy link

@andreselcientifico You're better off looking at _mm_rsqrt_ss (x86/x64) or vrsqrte_f32 (ARM), those provide a serious speed improvement and they are more precise

@hydrophobis
Copy link

hydrophobis commented Jul 11, 2025

Using an inline union seems to be much faster than normal, although it could just be the way that I'm timing them. these get about 9x faster than the source and the proposed changes in this PR when I run them

https://godbolt.org/z/TTPK8Yvc6
https://onlinegdb.com/OQt6Qwekn
And it runs about 3x faster on my local Win11 24H2 (0.018 vs 0.007)

Screenshot 2025-07-11 121626 Screenshot 2025-07-11 121635 image

@andreselcientifico
Copy link
Author

andreselcientifico commented Jul 11, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants