Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul "Pixel FIFO" article into "Rendering Internals" #379

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ISSOtm
Copy link
Member

@ISSOtm ISSOtm commented Oct 15, 2021

Also avoid describing SameBoy internals, instead relying on it when otherwise corroborated, or on schematics and/or test ROMs when possible.

Restructure the article to describe behavior more than components, especially in a way that is more friendly to someone not knowing what all the components are about.

Add a diagram, too, and move the mode timing diagram to the STAT article, where it belongs just as well, but where it will be more visible and thus more useful.

Fixes #377, fixes #408.

@ISSOtm ISSOtm added content Improvements or additions to documentation enhancement New feature or request labels Oct 15, 2021
src/Rendering_Internals.md Outdated Show resolved Hide resolved
src/Rendering_Internals.md Outdated Show resolved Hide resolved
src/Rendering_Internals.md Show resolved Hide resolved
src/Rendering_Internals.md Outdated Show resolved Hide resolved
src/Rendering_Internals.md Outdated Show resolved Hide resolved
@ISSOtm
Copy link
Member Author

ISSOtm commented Sep 24, 2022

I rebased the branch and applied some editorial changes; the article is still not finished, and should not be reviewed yet. (At least not before #350 is merged, so we can focus on that.)

Copy link
Member

@avivace avivace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just gave it a quick look.

Looks nice so far!

src/STAT.md Outdated
@@ -2,7 +2,7 @@

::: tip TERMINOLOGY

A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Pixel FIFO>).
A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>).
A *dot* is the shortest period of time over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>).

<text x="99" y="419">(9-bit tile ID, Y offset)</text>
<rect x="90" y="429" class="legend pxrow"/>
<text x="99" y="433">Pixel row (2 bytes)</text>
<text x="465" y="405" class="right">(Some arrows have been</text>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this text way smaller (and maybe in a angle?)

- The BG FIFO is not empty

Once both conditions are fulfilled, the OBJ FIFO takes over, discarding the pixels slices already fetched.
Note that if the BG FIFO is empty, the Pixel Slice Fetcher immediately switches to [Get tile ID](<#Get tile ID>) when refilling it, so the OBJ fetcher will wait for 6 additional dots.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding of the sprite timing, the maximum "penalty" for a sprite occurring too soon in the BG slice fetch sequence is 5 dots (making that sprite contribute at most 11 dots to mode 3). I'm not clear on the technical reason behind this, though.

This understanding is corroborated by https://gbdev.io/pandocs/STAT.html#properties-of-stat-modes and https://www.reddit.com/r/EmuDev/comments/59pawp/gb_mode3_sprite_timing/

- The Pixel Slice Fetcher is attempting to [push pixels](<#Push pixels>)
- The BG FIFO is not empty

Once both conditions are fulfilled, the OBJ FIFO takes over, discarding the pixels slices already fetched.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What pixel slices are discarded? Are they refetched later, and when is there time for that? I assumed the BG fetching sequence latches the fetched slices somewhere until it's able to successfully push them to the BG FIFO (for which it always has to wait at least 2 dots anyway).

@kapoisu
Copy link

kapoisu commented Jan 13, 2023

I'm going to propose some feedback about why I've had a hard time understanding this document from the point of view of a novice.

Especially in the Sprite part, there are terms like "X coordinate" which I'm not sure what they refer to. From the aspect of implementation, I reset the counter of the fetcher when I enter the window. Hence I can't compare the X coordinate of the sprite to it and it's reasonable to think that these terms refer to a counter which tracks the number of pixels actually shifted out and pushed to the LCD.

The assumption above is valid only if I keep as less states as possible. What if I had an extra counter which tracks the number of pixels already fetched? The timing of the sprite checking would be changed from "after a pixel is popped" to "before a pixel is fetched." This may not be true (to be honest I'm not that confident) but it's actually a reasonable consequence - because I have to do the sprite checking, I introduce this extra counter.

That is to say, I suppose the term is a little bit too brief because I can consider all of the counters mentioned above sorts of "X coordinate."

In the OBJ fetcher part,

the OBJ fetcher waits to take control until two conditions are met:

  • The Pixel Slice Fetcher is attempting to push pixels
  • The BG FIFO is not empty

Firstly, what is the exact timing of "attempting to push pixels?" Is it the 6th, 7th, or 8th cycle of the fetcher? From my understanding, this would affect the number of cycles added to mode 3.

I've learned - from somewhere else - that there may be delay up to 11 cycles for each sprite. There are descriptions such as "shorten/lengthen mode 3 by n dots" spread across the document. My point is, they are not tidied up and summarized in an easy-to-understand way. How/Why these conditions contribute to the max 11-cycle delay and when these conditions are checked aren't stated clearly.

Use this page as an example. 172-289 dots? Readers are not going to figure out 289 = 12 (before shifter is filled) + 160 (pixels per scanline) + 7 (max(SCX % 8)) + 11 (sprite delay) * 10 (number of sprites) from pure imagination (and I don't even know why a possible window restart is not included.)

With all the ambiguities combined, I can't even have a strong guarantee of the correctness of the overall control flow.

@ISSOtm ISSOtm force-pushed the rendering-internals branch from bbdc0ef to 15009a4 Compare July 2, 2023 10:03
Every time both FIFOs are clocked, the selector decides whether to retain the pixel from the BG or OBJ FIFO.

The selection follows the following rules:
1. **In CGB Mode**, if [`LCDC` bit 0 (priority enable)](<#CGB Mode: BG and Window master priority>) is reset, pick the BG pixel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule appears to be simply disabling OBJ, which duplicates the functionality of rule 3 and doesn't seem correct. I believe a correct rule would be to ignore rules 4 and 5.

@ISSOtm ISSOtm force-pushed the rendering-internals branch from 15009a4 to fe13a13 Compare March 6, 2024 13:24
@ISSOtm ISSOtm added the help wanted Extra attention is needed label Mar 6, 2024
Copy link
Contributor

@quinnyo quinnyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit of a noob with this part of the Game Boy, which is hopefully a good thing for someone reviewing this!
I've made a bunch of inline comments/suggestions that are all mostly writing/language based because I can't really correct technical points with much confidence.
Instead, I'll cover some general points and my issues with understanding the technical side here:

I can't get past "Pixel Slice Fetcher/Get Tile ID".

  • the Get Tile ID + BG + OBJ fetcher nested headings are confusing
  • The Pixel Slice Fetcher doesn't even do anything until Get Tile ID has been done?
  • we're getting the Tile ID but we're also getting attributes and doing ... something with them?
  • after the BG and OBJ fetchers have had a go, the Pixel Slice Fetcher does the actual pixel slice fetching -- which is only the Color ID part of the the tile?

Some of the points about FIFO refill priority and the related potential delays are repeated several times in slightly different ways. This makes the differences seem like they should be very meaningful but I can't discern what the meaning is.

The BG FIFO, BG Fetcher, BG Fetcher with Window Fetcher hat, OBJ FIFO, OBJ Fetcher terminology makes it really difficult to follow just because they all look/sound like the same words. Sometimes it seems like the two fetchers are part of their respective FIFO or vice versa.

I think I want a more clear distinction between procedural, architectural, conceptual information.
To be clear, the information mostly seems to be here, but the presentation sometimes jumps between these modes and I struggle to follow that. I'm sure others have both more and less difficulty with that than I do.

This is a really complicated thing you're trying to explain!


Once the last pixel has been output, the PPU releases the VRAM bus, and does nothing while it waits for the scanline to end.

The PPU embarks both a vertical counter (exposed as [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>)), *and* a horizontal counter, which will be referred to as "`LX`" henceforth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"embarks" doesn't seem to be the right word to use here.

There seems to be only one place where LX is used (#### BG Fetcher) so I think this should move there. I added my suggested change there.

Suggested change
The PPU embarks both a vertical counter (exposed as [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>)), *and* a horizontal counter, which will be referred to as "`LX`" henceforth.

</tr>
</tbody>
</table>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain LX here instead of above.

Maybe I misunderstood the purpose, but I think this makes the point more clearly:

Suggested change
:::tip
`LX` refers to the PPU's horizontal counter, or *LCD X coordinate*, but it isn't the name of a register like its vertical counterpart [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>).
:::

src/Rendering_Internals.md Outdated Show resolved Hide resolved
@@ -0,0 +1,408 @@
# Rendering Internals

The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"(= the screen)" is very informal compared to the rest of the text. I don't think you need to explain what the LCD is / is for.

Suggested change
The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels.
The Game Boy's PPU is the component responsible for feeding the LCD with pixels.

::: tip Terminology

A "dot" is the unit of time within the PPU.
One "dot" is one 4 MiHz cycle, i.e. a unit of time equal to 1 ∕ 4194304 of a second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to emphasise dot every time after introducing it. (There's many of these, not just this one)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here


#### BG fetcher

During this step, a tilemap is sampled to determine which tile to fetch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This under the BG fetcher heading sounds like the Bg fetcher is the step.

Suggested change
During this step, a tilemap is sampled to determine which tile to fetch.
When the BG fetcher is active, a tilemap is sampled to determine which tile to fetch.


### Get tile ID

This step determines which background/window tile to fetch pixels from.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This step determines which background/window tile to fetch pixels from.
This step determines which tile to fetch pixels from.


:::

A byte is read from the computed address, and is forwarded to the Pixel Slice Fetcher as a tile ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A byte is read from the computed address, and is forwarded to the Pixel Slice Fetcher as a tile ID.
A tile ID is read from the computed address, and is forwarded to the Pixel Slice Fetcher.

Comment on lines +141 to +146
::: tip Raster effects

Interestingly, unlike e.g. the NES' PPU, great care has been taken to ensure that the BG fetcher re-reads as many registers as possible (`SCY`, `LCDC`, etc.).
This may have been insight from the former console, on which [proper "raster splits" are quite tricky](https://www.nesdev.org/wiki/PPU_scrolling#Split_X_scroll) due to a lot of internal caching.

:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting but might be a bit out of scope?
It sounds like trivia, due to the framing with the NES PPU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also believe this may be a bit out of scope right there, especially the second "speculation" part.


:::

This step takes 2 dots, with the VRAM access(es) being performed on the second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this step" again -- is this different to the OBJ fetcher step?

Also avoid describing SameBoy internals, instead relying on it when
otherwise corroborated, or on schematics and/or test ROMs when possible.

Restructure the article to describe behavior more than components, especially
in a way that is more friendly to someone not knowing what all the components
are about.

Add a diagram, too, and move the mode timing diagram to the STAT article, where
it belongs just as well, but where it will be more visible and thus more useful.
@ISSOtm ISSOtm force-pushed the rendering-internals branch from 09c063f to 53c0f1d Compare December 17, 2024 21:44
Comment on lines +66 to +67
1. OAM is scanned for Y positions in range (based on [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>) and [`LCDC`](<#FF40 — LCDC: LCD control>)); the X coordinate is not checked!
2. The first 10 matches get their X and Y coordinates stored in an "OBJ slot".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this different depending on whether the software is running on a DMG (including CGB with OPRI set) or a CGB?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DMG and CGB have identical OAM scan behavior (only the Y coordinate is checked, and the first 10 in OAM order are matched) and OBJ fetch order (lower X first, or lower OAM index for the same X), the actual difference is how OBJ pixels are mixed in the FIFO.

DMG (including CGB with OPRI set) prioritizes all opaque pixels already in the OBJ FIFO when fetching a new OBJ, which means the fetch order (lower X etc.) determines the priority.

CGB prioritizes only the opaque pixels with a smaller OAM index than the new OBJ, which means the FIFO tracks the OAM index per pixel (or possibly the OBJ slot index, since those are in increasing OAM order and would allow cheaper FIFO storage and comparison hardware).

avivace and others added 2 commits January 7, 2025 10:50
Co-authored-by: Estus <git@estus.dev>
Co-authored-by: Quinn <3379314+quinnyo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contradictions in PPU FIFO size Get Tile step during the FIFO needs correction / improvement
7 participants