Does GameStream have any sort of innate advantage over Sunshine due to lower level access? #111
-
Hello, With the announcement that GameStream is going away in the new year, there is a lot of speculation as to if Sunshine is a perfect replacement for GameStream, or if Nvidia is doing something special with their low level access to the GPU that makes Gamestream innately superior to other alternatives. Does anyone here know if this is the case? In my own testing, I have found Gamestream and Sunshine to be within margin of error of each other. (Ethernet connected RTX 3080 host to Wi-Fi 6 Macbook Air M1.) The way I see it there are 3 possibilities:
Thanks for any help you can provide, and thanks for making this excellent piece of software that will keep local streaming alive as Nvidia gets rid of the official solution. |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 5 replies
-
Gamestream likely does capture + encode without anything moving off the GPU. Previous versions of the nvEncodeAPI used to allow this but it all got deprecated gradually but this is probably the special sauce they are using. Sunshine does Capture + Encode via a number of different ways (one of which encodes via nvenc) but with some small steps in between and also does thinks in a more generic way in order to support other GPUs and other OSes. So it can never be absolutely on par with Gamestream but the latest nightly build is pretty close. That extra stuff costs maybe a frame or two at 120Hz and sometimes not even that so in practical terms there is hardly any difference. However it takes tinkering and work to get it working perfectly. Some people observe massive differences which indicates that something is not quite right with the setup and more tinkering is needed. |
Beta Was this translation helpful? Give feedback.
-
When having friends over we take turns playing DOTA 2 on my PC while the other ones watching the game on my TV and relaxing in the couch with some beers. I always had insane lag (app latency from 8 to 20 ms.) and mouse DPI changing while duplicating the Monitor and TV. (HDMI + Displayport) Moonlight + Gamestream without cloning displays was slightly better better but still far from perfect. With Sunshine + Moonlight the mouse lag is completely gone and the stream runs amazing. (1080P 120fps) For me Suneshine has been a huge upgrade and much better than Gamestream because of the mouse lag issues! |
Beta Was this translation helpful? Give feedback.
-
use the nightly build or wait for v0.17, performance is right up there with gfe. |
Beta Was this translation helpful? Give feedback.
-
To add to my answer above. I did some performance profiling on the nightly. Linux + nvfbc + nvenc, 1440p/120Hz., no vsync. I measured the time taken to snapshot, convert, encode and broadcast using std::chrono::steady_clock at different points in the code. Granted, my clock accuracy probably isn't great but these were the results: snapshot: negligible (steady_clock says 0-1ns, system_clock says 20-30μs) In the above measurements, the micro/nanoseconds result are too small the measure so can be treated as negligible. The time taken from before the snapshot request (just after coming out of sleep from the previous cycle) to after fec encode was 3-4ms with most time being spent in the encode. At 120fps (8.33ms/frame), more time is spent sleeping than processing. Since sunshine can capture and send a frame in less time than the screen updates, it is absolutely on par with Gamestream for my setup. I haven't done the same with Windows sunshine but if I do run tests I'll post an update. EDIT: In my 4k/120Hz tests. The encode was ~1ms slower, so still within a frame. Encoder runs at only ~30% to achieve this. This is quite an improvement from my RTX 3070. |
Beta Was this translation helpful? Give feedback.
To add to my answer above. I did some performance profiling on the nightly. Linux + nvfbc + nvenc, 1440p/120Hz., no vsync.
I measured the time taken to snapshot, convert, encode and broadcast using std::chrono::steady_clock at different points in the code. Granted, my clock accuracy probably isn't great but these were the results:
snapshot: negligible (steady_clock says 0-1ns, system_clock says 20-30μs)
convert: negligible (says 18-30μs)
encode: 3-4ms
broadcast: negligible
In the above measurements, the micro/nanoseconds result are too small the measure so can be treated as negligible.
The time taken from before the snapshot request (just after coming out of sleep from the previous cycle)…