-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The emulator is not as fast as it's advertised. :-P #16
Comments
OK, that sounds a challenge! :-) To proceed with this we would need to be a bit more specific. So if you can share more sources and figures, that would help. As to Another evidence of some not very bad performance (even if rather indirect and relative) is comparing to https://github.com/begoon/i8080-core. So here are the numbers I see on my machine when feeding both the emulators with
Then enabling lazy flags (#6, as they currently are in their early and not very polished implementation) adds about 4 more percent to that difference, but that's a different story (and it's not implemented for z80 yet). Will take a closer look to that implementation mentioned and try to get some performance numbers for it. Not tracking ticks is not a problem with this emulator, but then how do you know when to fire up interrupts? Re Overall, I still feel confident that if you are after something very fast, this implementation may fit. Let's troubleshoot. :-) |
Here's what I got on my machine. For floooh/chips:
For z80:
Are you sure you compile z80 with optimisations enabled? EDIT: It's also interesting to compare code size. So after stripping the binaries it's 84,856 bytes for |
I've just retested it (here I used them to walk through Manic Miner game), and it indeed turned out that your emulator is 30% faster. I'm really sorry for the noise! I tried both clang and g++, both -O3 and -O2, it's the same everywhere (actually with -O2 the difference is even larger). So I should have kept using your emulator for my project rather than switching to another one.. Interestingly, I have commits before and after switching [to floooh's emulator] in my github repository, and after switching it does work faster, that's why I was sure it really was more performant. I'm investigating the reason why that happens, but still couldn't reduce the example. In my project I save/restore the machine state ~3000 times per second, but it should not cause any difference as the memory class is the same for both emulators and the only difference is saving/restoring registers. But doing that 3000 times per second hardly can be the reason for the slowdown. For the context, the project I used it for is to find the fastest possible playthrough of some ZX Spectrum games, Manic Miner and Jet Set Willy, by doing breadth-first search, which required saving and restoring the state many times of the second, and running the VM until the breakpoint. My next project is intended to be a game that involves an emulation of retro-futuristic "Z80 data center" on a single server, I hope to emulate at least 300-500 Z80 CPUs in parallel in "Z80 realtime", and I'm currently in the search of emulator library (and before today I was pretty sure I'd take floooh's library, but now it seem it's going to be this one). |
As I have the old code running, here is the profile. Probably doesn't help, but why not. |
mooskagh wrote:
Coincidentally, my TileMap project runs lots of Z80 cores in parallel to give a playable game map, currently just for ZX Spectrum titles. I've only pushed that as far as 512 screens for Starquake, which means 512 Z80 cores running in parallel. Like you I didn't care so much for timing accuracy or contention, just that it ran fast enough to maintain normal Spectrum speed. I used a different Z80 core at the time, but I'd be interested in trying this Z80 core in the same project to see how the performance compares. Are you sure that CPU performance is going to be the bottleneck for you? I think I might have run out of GPU power before CPU, even on my 10-year old quad-core i7 system. Though I did limit myself to converting the display with a pixel shader to improve system compatibility, and I'm sure a modern compute shader could do a much better job if I was willing to lift the system requirements. Also, Manic Miner and Jet Set Willy are both very LDIR heavy, so a disproportionate amount of the frame time is spent copying 2/3 of the display from back buffer to screen. That might change with other titles? It does explain why you'd like to accelerate that if possible -- was it Gerton Lunter's Spectrum emulation that had an option to do that maybe? :) |
Wow, that's brilliant, guys. My own motivation for better performance is implementing time machine for https://github.com/kosarev/zx so there's an efficient way to move backward and forward in time of an execution session by means of API calls. @mooskagh, I wonder what zx would need to have to be suitable for project like yours. |
Sorry for the provocative issue title, and not really a bug but just a piece of feedback. :-)
I've checked ~10 z80 emulation libraries, and most of them claim to be "fast", but it doesn't look like any performance comparison was made for any of them. Possibly anything faster than original z80 is considered "fast", but I believe that bar would be too low.
https://github.com/floooh/chips/blob/master/chips/z80.h is an example of something faster that this library (in my benchmark's it's 2.5x faster). But even it on modern CPU is only ~600 times faster than the real Z80. Which if you calculate CPU cycles is impressive ("works as if z80 clock was 2.0Ghz"), but given that Z80 instructions took much more cycles than instructions in the modern CPUs, I think it may be space to explore the ways to make it faster.
I did run a profiler for this library in my experiments ("on clang -O3, and on g++ -O3"), and as far as I remembered and understood them, the main slowdown seemed to be due to lots of nested function calls including calling
self()
just to getthis
of the correct type, during every of instruction decode. One may think that as all function calls are static, compiler would be clever enough to inline them or optimize them out, but it didn't happen neither on clang nor in g++ (both with -O3).Unfortunately, I didn't keep the profiler stats, but I can try to recreate them if needed.
As a side note not related to this project, I personally am in a search of really fast emulator, which doesn't have to have any precise timings. Even going as far as using memcpy() when decoding LDIR (and checking time till interrupt, whether BC or HL intersect 0 address, or whether they intersect the instruction itself) would be great.
The text was updated successfully, but these errors were encountered: