Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TESTING NEEDED: JIT Sparse Function Table, by riperiperi #83

Closed
wants to merge 3 commits into from

Conversation

GreemDev
Copy link
Member

When testing this PR, please clear your PTC cache for a game.

Makes CPU-heavy games faster at the cost of more memory mappings.
Testing needed.

@github-actions github-actions bot added the cpu An issue with ARMeilleure, the JIT, or Hypervisor label Oct 27, 2024
@GreemDev GreemDev self-assigned this Oct 27, 2024
@Digote
Copy link
Contributor

Digote commented Oct 27, 2024

Setup

Ryzen 5 5600
32GB - 3200Mhz
RTX 4060ti - Driver 566.03


Some comparisons:

Pokémon Legends Arceus:

Before (1.2.57)

pkm_main

After

pkm_pr


The Legend of Zelda™: Breath of the Wild

Before (1.2.57)

zelda_main

After

zelda_pr


We have significant FPS gains.

@Vbuck-gang
Copy link

Set up
7900xtx driver 24.9.1
32gb 3200mh
5800x3d
Fire Emblem Engage

Before:

Screenshot (150)

After:

Screenshot (149)

@Digote
Copy link
Contributor

Digote commented Oct 28, 2024

Setup

Ryzen 5 5600
32GB - 3200Mhz
RTX 4060ti - Driver 566.03


@Ryan2603

We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.

Here’s a sample of this game; if it didn't match exactly, it's even better now.

Before (1.2.59):

Main.mp4

After:

Pr.83.mp4

@Digote
Copy link
Contributor

Digote commented Oct 28, 2024

Setup
Ryzen 5 5600 32GB - 3200Mhz RTX 4060ti - Driver 566.03
@Ryan2603
We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.
Here’s a sample of this game; if it didn't match exactly, it's even better now.
Before (1.2.59):
Main.mp4
After:
Pr.83.mp4

thank for sharing some proof here, the after video shown unstable stutting obviously, running faster it is, but crashing somehow eventually, i know this PR given huge fps boost to some titles, like i said, leave us an option to enable/disable it alway the best for the adjustment.

So it wouldn’t be an issue in the PR but rather something related to the game being on Unreal, maybe stuttering.
I didn’t experience the stutters you mentioned, and I don’t think it’s a good idea to have an option to enable or disable for this specific pull request.

@extherian
Copy link
Contributor

extherian commented Oct 28, 2024

Setup
Ryzen 5 5600 32GB - 3200Mhz RTX 4060ti - Driver 566.03
@Ryan2603
We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.
Here’s a sample of this game; if it didn't match exactly, it's even better now.
Before (1.2.59):
Main.mp4
After:
Pr.83.mp4

thank for sharing some proof here, the after video shown unstable stutting obviously, running faster it is, but crashing somehow eventually, i know this PR given huge fps boost to some titles, like i said, leave us an option to enable/disable it alway the best for the adjustment.

Did you remember to delete the PPTC cache for this game? I myself had some crashes before I thought of deleting the cache, and you didn't mention whether you remembered to perform this vital step. Let us know if you have trouble figuring out how to do this.

@sokennethwasall
Copy link

Tried this build out on my M2 Max MacBook Pro last night and experienced random stuttering and more frequent crashes, despite rebuilding PPTC cache for all games.

FPS was definitely higher, but it was far less stable.

@Otozinclus
Copy link
Contributor

Otozinclus commented Oct 28, 2024

I tested this with M3 Air 24GB, at least on BOTW FPS are lower. I tested this with the same portable folder and just loaded the save in Kakariko, not doing anything else, with rebuilded PPTC cache of course and VSync turned off

Unrestricted FPS is usually 42-44fps and 15-16fps when energy saver is on (restricting powerdraw to 7.5w). With the PR, it is 40-42fps and 13-14fps with battery saver.

EDIT A video for reference: https://youtu.be/TfpgtIKdKJo

@GreemDev
Copy link
Member Author

Tried this build out on my M2 Max MacBook Pro last night and experienced random stuttering and more frequent crashes, despite rebuilding PPTC cache for all games.
FPS was definitely higher, but it was far less stable.

What game(s)?

@Edgecrusherr160
Copy link

Set up
Mac Studio M1 Max, 10 cores (8 performance and 2 efficiency)
64GB 6400 MT/s LPDDR5 (400GB/s) unified memory
24 core M1 Max GPU (G13X), shares 64GBGB unified memory
Tears of the Kingdom

I'm seeing identical system performance compared to Ryujinx r.6253fe1 ("Mirror Build"). Before each launch, I purged shader cache, deleted the content of the PPTG folder, as well as told it to Queue PPTC Rebuild. Maybe I'm missing something?

Ryujinx is using:

  • Aprox 300% CPU load*
    *I'm pretty sure 1,000% is considered max load, due to having 10 cores, so this would mean it's using 30% of my available CPU.
  • Aprox11GB of memory used
  • Aprox 85-90% GPU load

Before (r.6253fe1, 1080):
Ryujinx-r 6253fe1 (Mirror Build) -  1080 55fps

Before (r.6253fe1, 4K):
Ryujinx-r 6253fe1 (Mirror Build) -  4k 42fps

After (1.2.0+0833a59, 1080):
Ryujinx 1 2 0+0833a59 - 1080 55fps

After (1.2.0+0833a59, 4k):
Ryujinx 1 2 0+0833a59 - 4K 42fps

@extherian
Copy link
Contributor

extherian commented Oct 30, 2024

For anyone who isn't seeing any performance improvement from the builds linked above, could you try the older builds from which this sparse jit change was merged?

Here is the older sparse jit build for windows.

Here is the older sparse jit build for linux.

It is possible that we're not seeing the expected boost in performance from the GreemDev version because of accuracy improvements made since then that slowed down the emulator, and which may be the real bottleneck.

EDIT: also, don't forget to test at 1x resolution to minimise the chances of running into a GPU bottleneck rather than a CPU bottleneck, which is the one we are testing.

@Edgecrusherr160
Copy link

For anyone who isn't seeing any performance improvement from the builds linked above, could you try the older builds from which this sparse jit change was merged?

Here is the older sparse jit build for windows. Here is the older sparse jit build for linux.

It is possible that we're not seeing the expected boost in performance from the GreemDev version because of accuracy improvements made since then that slowed down the emulator, and which may be the real bottleneck.

Do you happen to have a link for the Mac version?

@extherian
Copy link
Contributor

Unfortunately I don't own a mac and therefore have no way of compiling the old version for macOS. However, if such a build did exist, it would only show performance improvements on x86 macs and not Apple Silicon ones like the M1 anyway. As far as we can tell, periperi didn't update the ARM64 JIT with the same sparse JIT changes like he did with the x86 one.

@Edgecrusherr160
Copy link

Unfortunately I don't own a mac and therefore have no way of compiling the old version for macOS. However, if such a build did exist, it would only show performance improvements on x86 macs and not Apple Silicon ones like the M1 anyway. As far as we can tell, periperi didn't update the ARM64 JIT with the same sparse JIT changes like he did with the x86 one.

Ok, thank do for the heads up!

@RafatarM
Copy link

R3600, RX470 4GB, 16GB 3600mhz

Fiz um teste rápido, tive basicamente o mesmo desempenho.

Sim, apaguei o PTC Cache

Before (1.2.64):
Captura de Tela (1)

After:
Captura de Tela (3)

@Digote
Copy link
Contributor

Digote commented Oct 31, 2024

R3600, RX470 4GB, 16GB 3600mhz

Fiz um teste rápido, tive basicamente o mesmo desempenho.

Sim, apaguei o PTC Cache

Before (1.2.64): Captura de Tela (1)

After: Captura de Tela (3)

Aqui é somente conversa em inglês/Here are only comments in English.

You need to test without a filter, as it seems that FSR is active.
Another thing, remove the FPS unlock mod; just disable VSync to check if the performance has improved.

@RafatarM
Copy link

R3600, RX470 4GB, 16GB 3600mhz
Fiz um teste rápido, tive basicamente o mesmo desempenho.
Sim, apaguei o PTC Cache
Before (1.2.64): Captura de Tela (1)
After: Captura de Tela (3)

Aqui é somente conversa em inglês/Here are only comments in English.

You need to test without a filter, as it seems that FSR is active. Another thing, remove the FPS unlock mod; just disable VSync to check if the performance has improved.

I tested very little today, I’ll do another test tomorrow with more time.

@rootopt
Copy link

rootopt commented Oct 31, 2024

Arch Linux user here! All 16 games in my library sawperformance gains from the update. Keep up the good work! The only game to stutter was TOTK

  • OS: Arch Linux
  • CPU: Ryzen 7900x3d
  • Ram: 32Gb DDR5 @ 6000Mhz
  • GPU: RTX 3060 (12Gb version)

@Ryan2603
Copy link

Ryan2603 commented Nov 1, 2024

Setup
Ryzen 5 5600 32GB - 3200Mhz RTX 4060ti - Driver 566.03
@Ryan2603
We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.
Here’s a sample of this game; if it didn't match exactly, it's even better now.
Before (1.2.59):
Main.mp4
After:
Pr.83.mp4

thank for sharing some proof here, the after video shown unstable stutting obviously, running faster it is, but crashing somehow eventually, i know this PR given huge fps boost to some titles, like i said, leave us an option to enable/disable it alway the best for the adjustment.

Did you remember to delete the PPTC cache for this game? I myself had some crashes before I thought of deleting the cache, and you didn't mention whether you remembered to perform this vital step. Let us know if you have trouble figuring out how to do this.

After virus clean up, This PR given 100% stable and no crash to me, a vote to merge.

@nhidog

This comment was marked as off-topic.

@miguemely
Copy link

Setup
Ryzen 5 5600 32GB - 3200Mhz RTX 4060ti - Driver 566.03
@Ryan2603
We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.
Here’s a sample of this game; if it didn't match exactly, it's even better now.
Before (1.2.59):
Main.mp4
After:
Pr.83.mp4

thank for sharing some proof here, the after video shown unstable stutting obviously, running faster it is, but crashing somehow eventually, i know this PR given huge fps boost to some titles, like i said, leave us an option to enable/disable it alway the best for the adjustment.

Did you remember to delete the PPTC cache for this game? I myself had some crashes before I thought of deleting the cache, and you didn't mention whether you remembered to perform this vital step. Let us know if you have trouble figuring out how to do this.

After virus clean up, This PR given 100% stable and no crash to me, a vote to merge.

Can you explain what you mean by "virus clean up"?

@sokennethwasall
Copy link

Setup:
M2 Macbook Pro Max
64GB DDR5 / 12 Core CPU / 38 Core GPU

TOTK:
Ryujinx:
https://youtu.be/q4flsLKGCDI

JIT Sparse:
https://youtu.be/23owNq3r2Mk

Some FPS improvements, but random stuttering and flashing

@Otozinclus
Copy link
Contributor

Otozinclus commented Nov 2, 2024

Setup: M2 Macbook Pro Max 64GB DDR5 / 12 Core CPU / 38 Core GPU

TOTK: Ryujinx: https://youtu.be/q4flsLKGCDI

JIT Sparse: https://youtu.be/23owNq3r2Mk

Some FPS improvements, but random stuttering and flashing

What are your settings you tested this with?

@sokennethwasall
Copy link

Setup: M2 Macbook Pro Max 64GB DDR5 / 12 Core CPU / 38 Core GPU
TOTK: Ryujinx: https://youtu.be/q4flsLKGCDI
JIT Sparse: https://youtu.be/23owNq3r2Mk
Some FPS improvements, but random stuttering and flashing

What are your settings you tested this with?

Base settings, no filtering

@Otozinclus
Copy link
Contributor

Setup: M2 Macbook Pro Max 64GB DDR5 / 12 Core CPU / 38 Core GPU
TOTK: Ryujinx: https://youtu.be/q4flsLKGCDI
JIT Sparse: https://youtu.be/23owNq3r2Mk
Some FPS improvements, but random stuttering and flashing

What are your settings you tested this with?

Base settings, no filtering

Weird, I am unable to reproduce it on Mac

With base settings you still mean Hypervisor turned off, right?

@sokennethwasall
Copy link

sokennethwasall commented Nov 2, 2024

Setup: M2 Macbook Pro Max 64GB DDR5 / 12 Core CPU / 38 Core GPU
TOTK: Ryujinx: https://youtu.be/q4flsLKGCDI
JIT Sparse: https://youtu.be/23owNq3r2Mk
Some FPS improvements, but random stuttering and flashing

What are your settings you tested this with?

Base settings, no filtering

Weird, I am unable to reproduce it on Mac

With base settings you still mean Hypervisor turned off, right?

VSync > Disabled
FS Integrity Checks > Enabled
DRAM Size > 4GiB
Ignore Missing Services > Disabled
Ignore Applet > Disabled
PPTC > Enabled
Low-Power PPTC cache > Disabled
Memory Manager Mode > host(fast)
Use Hypervisor > Disabled
Graphics Backend > Vulkan
Preferred GPU > Apple M2 Max
Enable Shader Cache > Enabled
Enable Shader Recompression > Enabled
Enable Macro HLE > Enabled
Color Space Passthrough > Disabled
Resolution Scale > Native
Anti-Aliasing > None
Scaling Filter > Bilinear
Anisotropic Filtering > Auto
Aspect Ratio > 16:9
Graphics Backend Multithreading > Auto
Audio Backend > SDL2
Multiplayer Mode > ldn_mitm

@Ryan2603
Copy link

Ryan2603 commented Nov 2, 2024

Setup
Ryzen 5 5600 32GB - 3200Mhz RTX 4060ti - Driver 566.03
@Ryan2603
We need better evidence to try to understand what the issue might be; so far, I've only seen benefits in this PR. Claiming a 40fps loss without information on your setup, game version, log and a video showing it doesn't help at all.
Here’s a sample of this game; if it didn't match exactly, it's even better now.
Before (1.2.59):
Main.mp4
After:
Pr.83.mp4

thank for sharing some proof here, the after video shown unstable stutting obviously, running faster it is, but crashing somehow eventually, i know this PR given huge fps boost to some titles, like i said, leave us an option to enable/disable it alway the best for the adjustment.

Did you remember to delete the PPTC cache for this game? I myself had some crashes before I thought of deleting the cache, and you didn't mention whether you remembered to perform this vital step. Let us know if you have trouble figuring out how to do this.

After virus clean up, This PR given 100% stable and no crash to me, a vote to merge.

Can you explain what you mean by "virus clean up"?

Don't worry , since i have report here about this PR memory leaking( 15GB hold & to be crash ), but found out actually my windows 11 somehow infected Nvidia's telemetry bot, by denying Nvidia's caches once, got fixed.
Now 9GB stable usage, never crash for 8hrs gaming.

@usr20210909
Copy link

usr20210909 commented Nov 2, 2024

I've tested it with "The Legend of Zelda: Tears of the Kingdom" using UltraCam mod. This PR had the best performance:

  • Ryujinx v1.1.403: 74 fps
  • Ryujinx v1.2.69: 75 fps (no UltraCam, V-SYNC off)
  • Ryujinx v1.2.69: 76 fps
  • Ryujinx v1.2.0+0833a59: 78 fps

Scene:

For each version I loaded the same save, teleported to Lookout Landing, removed all armor and weapons. In-game time is 5:40-5:55 PM. Game was running in fullscreen mode.

Execution:

In the scene the frame rate was varying by around 5 fps over time. Before taking the screenshots I was visually monitoring the fps counter and took each screenshot when the frame rate was at its peak. It probably would have been better to use the UltraCam internal benchmark functionality + CapFrameX instead of relying on screenshots...

Hardware:

  • CPU: AMD Ryzen 7 7800X3D
  • RAM: 64 GB, DDR5-6000, CL30
  • GPU: Nvidia GeForce RTX 4070 Super

Software:

  • Windows 11 23H2 (10.0.22631), with KB5041587
  • GeForce driver v566.03
  • RTSS v7.3.6 (hardware monitoring overlay)
  • Switch Firmware v18.1.0
  • Vulkan, SMAA Ultra, Anisotropic Filtering Auto (but 16x for v1.2.0+0833a59), Docked mode
  • TotK update v1.2.1
  • TotK mod: UltraCam v2.5 (2560x1440, 120 FPS limit, 2K shadows, 25000 RenderDistance, DisableFog, DisableFXAA, RemoveLensflare). Performance without the mod is almost identical on my hardware - see screenshot below. GPU load with no mod is at 40%, with the mod it's at 48%. So even with the mod the GPU is not the bottleneck.

Screenshots:

Ryujinx v1.1.403: 74 fps
1440p 8GB 74fps - Ultracam, 2K shadows - v1 2 1 - Ryujinx v1 1 403 - fullscreen

Ryujinx v1.2.69: 76 fps
1440p 8GB 76fps - Ultracam, 2K shadows - v1 2 1 - Ryujinx v1 2 69 - fullscreen

Ryujinx v1.2.69: 75 fps (no UltraCam, V-SYNC off)
1080p 8GB 75fps - No mods, V-SYNC off - v1 2 1 - Ryujinx v1 2 69 - fullscreen

Ryujinx v1.2.0+0833a59: 78 fps
1440p 8GB 78fps - Ultracam, 2K shadows - v1 2 1 - Ryujinx v1 2 0+0833a59 - fullscreen


I have now used the UltraCam built in benchmark for Lookout Landing.
The results of the first two runs of each series were discarded to assure creation of all shaders.
Results were taken from the benchmark's TOTKBenchmark.txt file.

Improvements to average FPS: 4.8%
Improvements to 1% and 0.1% FPS: 2.5%

Averaged results from multiple runs:
image

Raw data for v1.2.69...

Discarded results of first 2 runs:

Total frames: 3417, Average FPS: 65, 1% FPS: 31, 0.1% FPS: 22
Total frames: 3691, Average FPS: 70, 1% FPS: 29, 0.1% FPS: 6

Results of 16 runs were used:

Total frames: 3714, Average FPS: 70, 1% FPS: 45, 0.1% FPS: 35
Total frames: 3694, Average FPS: 70, 1% FPS: 45, 0.1% FPS: 35
Total frames: 3810, Average FPS: 72, 1% FPS: 45, 0.1% FPS: 37
Total frames: 3859, Average FPS: 73, 1% FPS: 45, 0.1% FPS: 38
Total frames: 3747, Average FPS: 71, 1% FPS: 43, 0.1% FPS: 35
Total frames: 3791, Average FPS: 72, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3721, Average FPS: 70, 1% FPS: 43, 0.1% FPS: 35
Total frames: 3830, Average FPS: 72, 1% FPS: 45, 0.1% FPS: 38
Total frames: 3806, Average FPS: 72, 1% FPS: 45, 0.1% FPS: 37
Total frames: 3756, Average FPS: 71, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3633, Average FPS: 69, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3706, Average FPS: 70, 1% FPS: 45, 0.1% FPS: 37
Total frames: 3686, Average FPS: 70, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3740, Average FPS: 71, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3660, Average FPS: 69, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3762, Average FPS: 71, 1% FPS: 47, 0.1% FPS: 40

Raw data for v1.2.0+0833a59...

Discarded results of first 2 runs:

Total frames: 3778, Average FPS: 71, 1% FPS: 38, 0.1% FPS: 21
Total frames: 3845, Average FPS: 73, 1% FPS: 45, 0.1% FPS: 35

Results of 13 runs were used:

Total frames: 3949, Average FPS: 75, 1% FPS: 50, 0.1% FPS: 38
Total frames: 3878, Average FPS: 73, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3925, Average FPS: 74, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3942, Average FPS: 75, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3872, Average FPS: 73, 1% FPS: 45, 0.1% FPS: 38
Total frames: 3923, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3869, Average FPS: 73, 1% FPS: 47, 0.1% FPS: 37
Total frames: 3944, Average FPS: 75, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3940, Average FPS: 75, 1% FPS: 45, 0.1% FPS: 37
Total frames: 3927, Average FPS: 74, 1% FPS: 47, 0.1% FPS: 37
Total frames: 3900, Average FPS: 74, 1% FPS: 45, 0.1% FPS: 35
Total frames: 3899, Average FPS: 74, 1% FPS: 45, 0.1% FPS: 37
Total frames: 3987, Average FPS: 75, 1% FPS: 47, 0.1% FPS: 40

v1.2.69 Log...

00:00:00.193 |I| Configuration LogValueChange: ResScale set to: 1
00:00:00.199 |I| Configuration LogValueChange: ResScaleCustom set to: 1
00:00:00.199 |I| Configuration LogValueChange: MaxAnisotropy set to: 16
00:00:00.200 |I| Configuration LogValueChange: AspectRatio set to: Fixed16x9
00:00:00.200 |I| Configuration LogValueChange: BackendThreading set to: Auto
00:00:00.201 |I| Configuration LogValueChange: GraphicsBackend set to: Vulkan
00:00:00.201 |I| Configuration LogValueChange: PreferredGpu set to: 0x10DE_0x2783
00:00:00.202 |I| Configuration LogValueChange: AntiAliasing set to: SmaaUltra
00:00:00.202 |I| Configuration LogValueChange: ScalingFilter set to: Bilinear
00:00:00.202 |I| Configuration LogValueChange: ScalingFilterLevel set to: 50
00:00:00.203 |I| Configuration LogValueChange: EnableDockedMode set to: True
00:00:00.203 |I| Configuration LogValueChange: EnableVsync set to: True
00:00:00.203 |I| Configuration LogValueChange: EnableShaderCache set to: True
00:00:00.203 |I| Configuration LogValueChange: EnableTextureRecompression set to: False
00:00:00.203 |I| Configuration LogValueChange: EnableMacroHLE set to: True
00:00:00.203 |I| Configuration LogValueChange: EnableColorSpacePassthrough set to: False
00:00:00.204 |I| Configuration LogValueChange: EnablePtc set to: True
00:00:00.204 |I| Configuration LogValueChange: EnableLowPowerPtc set to: False
00:00:00.204 |I| Configuration LogValueChange: EnableInternetAccess set to: False
00:00:00.204 |I| Configuration LogValueChange: EnableFsIntegrityChecks set to: True
00:00:00.204 |I| Configuration LogValueChange: FsGlobalAccessLogMode set to: 0
00:00:00.205 |I| Configuration LogValueChange: AudioBackend set to: SDL2
00:00:00.205 |I| Configuration LogValueChange: AudioVolume set to: 1
00:00:00.206 |I| Configuration LogValueChange: MemoryManagerMode set to: HostMappedUnsafe
00:00:00.206 |I| Configuration LogValueChange: DramSize set to: MemoryConfiguration8GiB
00:00:00.206 |I| Configuration LogValueChange: IgnoreMissingServices set to: True
00:00:00.206 |I| Configuration LogValueChange: UseHypervisor set to: True
00:00:00.208 |I| Configuration LogValueChange: MultiplayerMode set to: Disabled
00:00:00.210 |N| Application PrintSystemInfo: Ryujinx Version: 1.2.69
00:00:00.214 |N| Application Print: Operating System: Microsoft Windows 10.0.22631 (X64)
00:00:00.214 |N| Application Print: CPU: AMD Ryzen 7 7800X3D 8-Core Processor ; 16 logical
00:00:00.216 |N| Application Print: RAM: Total 63.1 GiB ; Available 59.4 GiB
00:00:00.217 |N| Application PrintSystemInfo: Logs Enabled: Info, Warning, Error, Guest, Stub
00:00:00.218 |N| Application PrintSystemInfo: Launch Mode: Portable
00:00:00.050 |I| Gpu : Backend Threading (Auto): True
00:00:00.211 |N| Application LoadGuestApplication: Using Firmware Version: 18.1.0
00:00:00.211 |I| Application LoadGuestApplication: Loading as XCI.

@LotP1
Copy link
Contributor

LotP1 commented Nov 2, 2024

@usr20210909 you can get more consistent results by running a few of the benchmarks built into UltraCam. just run 3 and average them.

@usr20210909
Copy link

@usr20210909 you can get more consistent results by running a few of the benchmarks built into UltraCam. just run 3 and average them.

Done - added more data to my comment.

@BiNh0X
Copy link

BiNh0X commented Nov 3, 2024

Great job, guys!

I tested the Ryujinx v1.2.69 yesterday on this configuration:

Ryujinx v1.2.69 (Vsync OFF, 1x, no mods)

i3-1115G4

Intel UHD Xe G4

16 GB (8+8)

SATA SSD

I tested some games like Mario Party Jamboree, the new ML:B, EoW and other older games, and all of them performed BETTER than the latest released build of Ryujinx. One of the games that I couldn't run stably and had a lot of frame drops, even with the 60 fps mod, was Disney Epic Mickey Rebrushed. Now, it works without frame drops and always above 30 fps. It's much more stable! I was really impressed and you guys are on the right track. Ryujinx 1403 is known for running with a lot of stutters here and your fork ran wonderfully well, even achieving a constant 60 fps in some older games (SM3DW+BF). I didn't use any mods in the games.

Some examples:

Mario Party: Jamboree: Even without mods, it ran better than the last Sudachi build + performance mods. About 20% overall and even better in some areas.

Zelda EoW: Amazing! It ran so much better than the last Ryujinx v1.1.403 build. It had higher frame rates and was more stable even running without any mods, versus Ryujinx v1.1.403 with mods (no AA, DOF off, 60 fps potato).

@GreemDev GreemDev changed the title JIT Sparse Function Table, by riperiperi TESTING NEEDED: JIT Sparse Function Table, by riperiperi Nov 7, 2024
@usr20210909
Copy link

usr20210909 commented Nov 9, 2024

More testing:

I have updated my Windows 11 from 23H2 to 24H2, which is supposed to improve the performance of AMD Zen 3/4/5 CPUs.

Also instead of using the older Ryujinx version 1.2.69 I ran the new test with the latest 1.2.72. In 1.2.69...1.2.72 I see no changes that would affect the performance, so the results between those two versions should be comparable.

See here for used HW and SW: #83 (comment)

1st conclusion: 24H2 provides a better performance over 23H2

  • With 24H2 for this PR the 1% and 0.1% FPS have improved by 4%-5% while the average FPS are almost the same
  • With 24H2 in version 1.2.72 all FPS are higher by 4%-6% than 1.2.69 on 23H2

2nd conclusion: This PR is only slightly faster than 1.2.72 when running under 24H2

  • With 24H2 this PR has 1.5% higher FPS over version 1.2.72

Here is the updated graph with the added new test results:
image

Raw data for v1.2.72 (23H4)...

Total frames: 3818, Average FPS: 72, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3907, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 38
Total frames: 3932, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 41
Total frames: 3855, Average FPS: 73, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3861, Average FPS: 73, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3832, Average FPS: 72, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3924, Average FPS: 74, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3937, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3901, Average FPS: 74, 1% FPS: 47, 0.1% FPS: 37
Total frames: 3901, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3850, Average FPS: 73, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3947, Average FPS: 75, 1% FPS: 50, 0.1% FPS: 40

Raw data for v1.2.0+0833a59 (23H4)...

Total frames: 3904, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 38
Total frames: 3964, Average FPS: 75, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3995, Average FPS: 76, 1% FPS: 52, 0.1% FPS: 43
Total frames: 3972, Average FPS: 75, 1% FPS: 50, 0.1% FPS: 40
Total frames: 3932, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 37
Total frames: 3929, Average FPS: 74, 1% FPS: 47, 0.1% FPS: 40
Total frames: 3933, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 41
Total frames: 3746, Average FPS: 71, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3956, Average FPS: 75, 1% FPS: 47, 0.1% FPS: 34
Total frames: 3866, Average FPS: 73, 1% FPS: 47, 0.1% FPS: 38
Total frames: 3931, Average FPS: 74, 1% FPS: 50, 0.1% FPS: 40
Total frames: 4024, Average FPS: 76, 1% FPS: 50, 0.1% FPS: 43
Total frames: 4077, Average FPS: 77, 1% FPS: 50, 0.1% FPS: 41
Total frames: 4022, Average FPS: 76, 1% FPS: 50, 0.1% FPS: 41
Total frames: 3980, Average FPS: 75, 1% FPS: 50, 0.1% FPS: 40

@Caneduro
Copy link

Caneduro commented Nov 10, 2024

pc specs
R5 3600
RTX2060
16Gb DDR4 3600mhz
W11 23H2

Pikmin 4
60fps mod / No Vsync / Vulkan / 2x RES / FSR / ULTRA SMAA / 16X Anisotropic / auto multithreading

Ryujinx 1.2.72
Screenshot (634)

Ryujinx Release-1.2.0+0833a59-win_x64
Screenshot (635)

@NhProGamer
Copy link

NhProGamer commented Nov 11, 2024

My specs: Intel core I5 9600KF OC at 4.8GHz, Gigabyte Geforce RTX 3060 with factory OC, OS: Windows
Procedure witl all my tests: clear PPTC & shader, do a first benchmark at blank for build shader cache and PPTC, walk around the zone of the Benchmark for loading more shaders and instructions in PPTC, i'm testing The Legens of Zelda: Tears of the Kingdom at Lookout Landing with default settings on UltraCam mod fps unlocked to 60 (benchmark feature is from that mod) Vulkan, VRAM at 12G FSR at 80% default resolution, no Antialising

Results:
Without JITSparse
Total frames: 1424, Average FPS: 27, 1% FPS: 14, 0.1% FPS: 9

With JITSparse
Total frames: 1557, Average FPS: 29, 1% FPS: 22, 0.1% FPS: 21

That patch is really good in allcases (better FPS, less lag spikes)

@Ambush-catlover

This comment was marked as off-topic.

@usr20210909
Copy link

Continuation of #83 (comment):

Benchmark result when using CapFrameX instead of relying on UltraCam's internal benchmark results so the frame times can be visualized in a graph.

First two runs were discarded while keeping the results of the third run only. There's a lot of FPS variance in each run (probably due to many NPCs in the area doing their thing) so this comparison is not very robust. Maybe running the benchmark in a different area of the map with less NPCs would provide a more stable result.

Conclusion:

  • 1% and 0.1% FPS are better by 5%
  • Average FPS are better by 1.5%

Bar charts:
image

FPS (big variance for every test run):
image

Frame times:
image

GPU-Busy times:
image

Variances:
image

@usr20210909
Copy link

And here we go again with more TotK benchmarking!

Since there's a lot of FPS variance in the Lookout Landing area for each run (probably due to several free roaming NPCs), which leads to inconsistent results, I now ran the other four UltraCam benchmarks and recorded the data with CapFrameX.

I collected the data of three runs for each area and discarded the first run since it has the worst performance. So each data set contains the result of four runs - two for each Ryujinx version. Adding any more results makes the data hard to read.

Kakariko

There's lower FPS in the first seconds and a peak in the frametimes towards the end in one of the runs recorded with 1.2.72. Overall this PR has a slightly better performance.

image

Click here for FPS and frametime line charts/L-shapes, variances...

image

image

image

Great Sky Island

Two big frametime spikes in a run recorded with 1.2.72. This PR also caused a frametime spike in the middle of a run. No clear winner here.

image

Click here for FPS and frametime line charts/L-shapes, variances...

image

image

image

Goron City

A clearly better performance in this PR - even with a single huge frametime spike in the middle of one of its runs.

image

Click here for FPS and frametime line charts/L-shapes, variances...

image

image

image

Korok Forest

No huge spikes in any of those runs. This PR shows better average and 1% FPS results.

image

Click here for FPS and frametime line charts/L-shapes, variances...

image

image

image

@GreemDev
Copy link
Member Author

An improved variant of this PR has been merged. Closing.

@GreemDev GreemDev closed this Nov 23, 2024
@GreemDev GreemDev deleted the refactor/arm/jit-sparse branch November 23, 2024 00:46
@Ryubing Ryubing locked as resolved and limited conversation to collaborators Nov 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cpu An issue with ARMeilleure, the JIT, or Hypervisor
Projects
None yet
Development

Successfully merging this pull request may close these issues.