Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide double width 4bpp output to lcd_draw_line #107

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ccawley2011
Copy link
Contributor

By packing the palette into a single nibble and expanding it when the palette register as is, it's possible to output double-width 4bpp graphics data without much overhead. This is useful on RISC OS for rendering in modes 12 or 27, where the resulting buffer can be copied directly to the screen as-is.

This overlaps with the changes in the glfw_gles2 and sdl2-swrenderer branches, so it should probably be combined with one of them before merging.

@ccawley2011 ccawley2011 marked this pull request as ready for review May 30, 2024 14:50
@ccawley2011
Copy link
Contributor Author

This has been made optional, so it should be ready for review.

@deltabeard
Copy link
Owner

Thank you for this! Have you run any benchmarks before and after these changes on RISC OS using peanut-benchmark to verify the speed improvement? I'll have a thorough look at this pull request when I can. 👍

peanut_gb.h Outdated Show resolved Hide resolved
@ccawley2011
Copy link
Contributor Author

Thank you for this! Have you run any benchmarks before and after these changes on RISC OS using peanut-benchmark to verify the speed improvement? I'll have a thorough look at this pull request when I can. 👍

I did some tests with this, and enabling PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE plus using memcpy is noticeably faster on a StrongARM RiscPC compared to expanding each byte in lcd_draw_line.

@deltabeard
Copy link
Owner

enabling PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE plus using memcpy is noticeably faster on a StrongARM RiscPC

I'm trying to understand the usefulness of PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE. If each byte given to lcd_draw_line consists of same two nibbles for a pixel, how are you using memcpy to increase the performance of lcd_draw_line? I think that PEANUT_GB_USE_NIBBLE_FOR_PALETTE is only required for an 8-bit paletted bitmap that only uses 12 colours which can be copied using memcpy as was done in the glfw_gles2 branch. Is PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE supposed to allow for pushing two pixels at a time to a 4-bit paletted texture?

Would you kindly be able to provide example code as to how PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE is supposed to be used please?

@ccawley2011
Copy link
Contributor Author

Is PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE supposed to allow for pushing two pixels at a time to a 4-bit paletted texture?

That's essentially it - the only difference is that since the RiscPC only has a basic display controller rather than a full GPU, there are limited options for scaling the final image.

I've put my current code on GitHub if you want to take a look. It's a bit primitive at the moment, but it can run games in mode 12 (640x256x4, with aspect ratio correction using older monitors) and mode 27 (640x480x4, with lines doubled in software). I may add support for mode 28 as well (640x480x8) for compatibility with newer devices like the Raspberry Pi.

https://github.com/ccawley2011/Peanut-GB-riscos

@deltabeard
Copy link
Owner

deltabeard commented Jun 3, 2024

That's essentially it - the only difference is that since the RiscPC only has a basic display controller rather than a full GPU, there are limited options for scaling the final image.

I see now. So the double width allows for easy scaling the output image to 320x288 by making taking advantage of the fact that the output is a 4-bit palette. It might also then be possible to have the output as a 2-bit palette for scaling to 640x576 (albeit with less colours), but that can be added later if anyone is interested. Without PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE defined, the output is technically an 8-bit pixel, even though only 12 colours are ever used at the moment (this may change when CGB support is added).

I'll do some light testing when I get the time (should be this week) and then I'll merge. 🙂

@@ -36,7 +38,7 @@ struct priv_t
uint8_t *bootrom;

/* Colour palette for each BG, OBJ0, and OBJ1. */
uint16_t selected_palette[3][4];
uint16_t selected_palette[4 * 4];
Copy link
Owner

@deltabeard deltabeard Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed to 4 * 4 instead of 3 * 4 here and in other declarations?

Copy link
Owner

@deltabeard deltabeard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments then I'll be happy to merge. Sorry for the delay.

* to performing bit shifts or using a larger LUT.
*/
#ifndef PEANUT_GB_USE_NIBBLE_FOR_PALETTE
# define PEANUT_GB_USE_NIBBLE_FOR_PALETTE PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PEANUT_GB_USE_NIBBLE_FOR_PALETTE is defined to 0 by the user (e.g. compile time define) but the user enabled PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE, should there be an error here?
The comment should state that PEANUT_GB_USE_NIBBLE_FOR_PALETTE is required if PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE is enabled.

{
{ 0x7FFF, 0x5294, 0x294A, 0x0000 },
{ 0x7FFF, 0x5294, 0x294A, 0x0000 },
{ 0x7FFF, 0x5294, 0x294A, 0x0000 }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of changing [3][4] to [3*4]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants