Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

Make most(if not all) float operations single precision#4

Merged
Line-fr merged 7 commits intoLine-fr:mainfrom
SwareJonge:fp-fix
Apr 2, 2025
Merged

Make most(if not all) float operations single precision#4
Line-fr merged 7 commits intoLine-fr:mainfrom
SwareJonge:fp-fix

Conversation

@SwareJonge
Copy link
Contributor

Considering the final results and operations on the GPU are meant to be single precision, it doesn't make much sense to do some with double precision(as that information gets lost anyway), by making it all single precision it's slightly faster.
I also noticed sometimes abs was used instead of fabs or fabsf, which i replaced with the float versions(unless this was intentional or gets overloaded?).

If there's anything wrong with this, let me know!

@Line-fr
Copy link
Owner

Line-fr commented Feb 26, 2025

Thank you for the useful pull request! I am sure it will help lower end nvidia gpus that have limited fp64 compute capabilities
I was able to observe in nsight compute that it contributed to the computing bottleneck so it is nice to have them removed

On the other hand, I think that I will remove the CPU fp64 translations since it doesnt really bring any benefits. Without AVX, fp64 and fp32 have the same performances when not using too much bandwidth.
Also, you modified some things like, float a = 300 -> float a = 300f everywhere (and some where it isnt written like that but is the same for the compiler) which are not useful since the compiler does it automatically. But I will let them since they do no harm

I will work on the commit tonight (for my time zone so in about 4 hours)

@SwareJonge
Copy link
Contributor Author

SwareJonge commented Feb 26, 2025

It did generate different code in my tests, since without f at the end of float literals it assumes double precision(and it's also a habit of mine since I usually work with ancient compilers where it does matter)

Edit: I'm not sure if this matters for exact numbers but like I said, just a habit of mine

@Line-fr
Copy link
Owner

Line-fr commented Feb 26, 2025

Also, I think powf is windows only so I cannot allow that
but std::pow will use float operations if its input are float

@SwareJonge
Copy link
Contributor Author

Id don't think powf is windows only? it should be part of the standard library.
Also just noticed that CUDA's fabs is meant for double precision and abs for single precision(confusing)

@Line-fr
Copy link
Owner

Line-fr commented Feb 26, 2025

I reviewed it, it seems alright for me
the precision impact is negligible
the code modified is fine
it compiles under AMD Windows, WSL AMD Ubuntu, NVIDIA Windows
the speed boost is noticeable

when you are ready I will be able to merge

@SwareJonge
Copy link
Contributor Author

Finally done with this, both ssimulacra2 and butter should output the same as your code with a ~5% speed increase(on lower end hardware)

@Line-fr
Copy link
Owner

Line-fr commented Mar 7, 2025

something weird might be happening, I agree that the scores did not change
but on my RTX 4050 mobile I get this

your branch
swae

main
main

so on my RX 7900XTX I get no speed difference, but your branch is slower on my laptop

@SwareJonge
Copy link
Contributor Author

It really sucks that I basically have only one system I can test this on, but from all runs I did on a RTX 4060 mobile there were some improvements. What exactly did you use to output these graphs?

@Line-fr
Copy link
Owner

Line-fr commented Mar 7, 2025

These graphs should be about the same as the ReadMeGraph is the usageScript folder
but it needs your own videos and you need to input the number of frames everywhere if you want to get real fps values (if you only want to measure % increase it doesnt really matter anyway

@Line-fr Line-fr merged commit 31e1de1 into Line-fr:main Apr 2, 2025
@Line-fr
Copy link
Owner

Line-fr commented Apr 2, 2025

fiou I am sorry to have taken so much time to do it!
No it's integrated into main, thank you for your contribution ^^

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants