Make most(if not all) float operations single precision by SwareJonge · Pull Request #4 · Line-fr/Vship

SwareJonge · 2025-02-26T00:33:19Z

Considering the final results and operations on the GPU are meant to be single precision, it doesn't make much sense to do some with double precision(as that information gets lost anyway), by making it all single precision it's slightly faster.
I also noticed sometimes abs was used instead of fabs or fabsf, which i replaced with the float versions(unless this was intentional or gets overloaded?).

If there's anything wrong with this, let me know!

Line-fr · 2025-02-26T14:11:59Z

Thank you for the useful pull request! I am sure it will help lower end nvidia gpus that have limited fp64 compute capabilities
I was able to observe in nsight compute that it contributed to the computing bottleneck so it is nice to have them removed

On the other hand, I think that I will remove the CPU fp64 translations since it doesnt really bring any benefits. Without AVX, fp64 and fp32 have the same performances when not using too much bandwidth.
Also, you modified some things like, float a = 300 -> float a = 300f everywhere (and some where it isnt written like that but is the same for the compiler) which are not useful since the compiler does it automatically. But I will let them since they do no harm

I will work on the commit tonight (for my time zone so in about 4 hours)

SwareJonge · 2025-02-26T14:54:07Z

It did generate different code in my tests, since without f at the end of float literals it assumes double precision(and it's also a habit of mine since I usually work with ancient compilers where it does matter)

Edit: I'm not sure if this matters for exact numbers but like I said, just a habit of mine

Line-fr · 2025-02-26T19:47:48Z

Also, I think powf is windows only so I cannot allow that
but std::pow will use float operations if its input are float

SwareJonge · 2025-02-26T20:46:18Z

Id don't think powf is windows only? it should be part of the standard library.
Also just noticed that CUDA's fabs is meant for double precision and abs for single precision(confusing)

Line-fr · 2025-02-26T21:00:57Z

I reviewed it, it seems alright for me
the precision impact is negligible
the code modified is fine
it compiles under AMD Windows, WSL AMD Ubuntu, NVIDIA Windows
the speed boost is noticeable

when you are ready I will be able to merge

SwareJonge · 2025-03-01T00:12:24Z

Finally done with this, both ssimulacra2 and butter should output the same as your code with a ~5% speed increase(on lower end hardware)

Line-fr · 2025-03-07T10:29:37Z

something weird might be happening, I agree that the scores did not change
but on my RTX 4050 mobile I get this

your branch

main

so on my RX 7900XTX I get no speed difference, but your branch is slower on my laptop

SwareJonge · 2025-03-07T11:59:35Z

It really sucks that I basically have only one system I can test this on, but from all runs I did on a RTX 4060 mobile there were some improvements. What exactly did you use to output these graphs?

Line-fr · 2025-03-07T12:35:11Z

These graphs should be about the same as the ReadMeGraph is the usageScript folder
but it needs your own videos and you need to input the number of frames everywhere if you want to get real fps values (if you only want to measure % increase it doesnt really matter anyway

Line-fr · 2025-04-02T17:25:06Z

fiou I am sorry to have taken so much time to do it!
No it's integrated into main, thank you for your contribution ^^

SwareJonge added 2 commits February 26, 2025 01:20

Make most(if not all) float operations single precision

e03f00d

Missed one function

ae9f3be

SwareJonge added 2 commits February 26, 2025 21:28

fabs -> abs, last mssing literals

89512e7

last remaining literals

aa20d74

SwareJonge added 2 commits February 27, 2025 01:38

Make ssimu2 calculation output the same as git latest

850f56e

revert some things, match output of original

d1beb20

revert another change in ssimu2

f64e650

Line-fr merged commit 31e1de1 into Line-fr:main Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make most(if not all) float operations single precision#4

Make most(if not all) float operations single precision#4
Line-fr merged 7 commits intoLine-fr:mainfrom
SwareJonge:fp-fix

SwareJonge commented Feb 26, 2025

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Feb 26, 2025 •

edited

Loading

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Feb 26, 2025

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Mar 1, 2025

Uh oh!

Line-fr commented Mar 7, 2025

Uh oh!

SwareJonge commented Mar 7, 2025

Uh oh!

Line-fr commented Mar 7, 2025

Uh oh!

Line-fr commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SwareJonge commented Feb 26, 2025

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Feb 26, 2025

Uh oh!

Line-fr commented Feb 26, 2025

Uh oh!

SwareJonge commented Mar 1, 2025

Uh oh!

Line-fr commented Mar 7, 2025

Uh oh!

SwareJonge commented Mar 7, 2025

Uh oh!

Line-fr commented Mar 7, 2025

Uh oh!

Line-fr commented Apr 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SwareJonge commented Feb 26, 2025 •

edited

Loading