-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul "Pixel FIFO" article into "Rendering Internals" #379
base: master
Are you sure you want to change the base?
Conversation
bae52a7
to
bbdc0ef
Compare
I rebased the branch and applied some editorial changes; the article is still not finished, and should not be reviewed yet. (At least not before #350 is merged, so we can focus on that.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a quick look.
Looks nice so far!
src/STAT.md
Outdated
@@ -2,7 +2,7 @@ | |||
|
|||
::: tip TERMINOLOGY | |||
|
|||
A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Pixel FIFO>). | |||
A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A *dot* is the shortest period over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>). | |
A *dot* is the shortest period of time over which the PPU can output one pixel: is it equivalent to 1 T-state on DMG or on CGB single-speed mode or 2 T-states on CGB double-speed mode. On each dot during mode 3, either the PPU outputs a pixel or the fetcher is stalling the [FIFOs](<#Rendering Internals>). |
src/imgs/ppu_overview.svg
Outdated
<text x="99" y="419">(9-bit tile ID, Y offset)</text> | ||
<rect x="90" y="429" class="legend pxrow"/> | ||
<text x="99" y="433">Pixel row (2 bytes)</text> | ||
<text x="465" y="405" class="right">(Some arrows have been</text> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this text way smaller (and maybe in a angle?)
- The BG FIFO is not empty | ||
|
||
Once both conditions are fulfilled, the OBJ FIFO takes over, discarding the pixels slices already fetched. | ||
Note that if the BG FIFO is empty, the Pixel Slice Fetcher immediately switches to [Get tile ID](<#Get tile ID>) when refilling it, so the OBJ fetcher will wait for 6 additional dots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my understanding of the sprite timing, the maximum "penalty" for a sprite occurring too soon in the BG slice fetch sequence is 5 dots (making that sprite contribute at most 11 dots to mode 3). I'm not clear on the technical reason behind this, though.
This understanding is corroborated by https://gbdev.io/pandocs/STAT.html#properties-of-stat-modes and https://www.reddit.com/r/EmuDev/comments/59pawp/gb_mode3_sprite_timing/
- The Pixel Slice Fetcher is attempting to [push pixels](<#Push pixels>) | ||
- The BG FIFO is not empty | ||
|
||
Once both conditions are fulfilled, the OBJ FIFO takes over, discarding the pixels slices already fetched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What pixel slices are discarded? Are they refetched later, and when is there time for that? I assumed the BG fetching sequence latches the fetched slices somewhere until it's able to successfully push them to the BG FIFO (for which it always has to wait at least 2 dots anyway).
I'm going to propose some feedback about why I've had a hard time understanding this document from the point of view of a novice. Especially in the Sprite part, there are terms like "X coordinate" which I'm not sure what they refer to. From the aspect of implementation, I reset the counter of the fetcher when I enter the window. Hence I can't compare the X coordinate of the sprite to it and it's reasonable to think that these terms refer to a counter which tracks the number of pixels actually shifted out and pushed to the LCD. The assumption above is valid only if I keep as less states as possible. What if I had an extra counter which tracks the number of pixels already fetched? The timing of the sprite checking would be changed from "after a pixel is popped" to "before a pixel is fetched." This may not be true (to be honest I'm not that confident) but it's actually a reasonable consequence - because I have to do the sprite checking, I introduce this extra counter. That is to say, I suppose the term is a little bit too brief because I can consider all of the counters mentioned above sorts of "X coordinate." In the OBJ fetcher part,
Firstly, what is the exact timing of "attempting to push pixels?" Is it the 6th, 7th, or 8th cycle of the fetcher? From my understanding, this would affect the number of cycles added to mode 3. I've learned - from somewhere else - that there may be delay up to 11 cycles for each sprite. There are descriptions such as "shorten/lengthen mode 3 by n dots" spread across the document. My point is, they are not tidied up and summarized in an easy-to-understand way. How/Why these conditions contribute to the max 11-cycle delay and when these conditions are checked aren't stated clearly. Use this page as an example. 172-289 dots? Readers are not going to figure out 289 = 12 (before shifter is filled) + 160 (pixels per scanline) + 7 (max(SCX % 8)) + 11 (sprite delay) * 10 (number of sprites) from pure imagination (and I don't even know why a possible window restart is not included.) With all the ambiguities combined, I can't even have a strong guarantee of the correctness of the overall control flow. |
bbdc0ef
to
15009a4
Compare
Every time both FIFOs are clocked, the selector decides whether to retain the pixel from the BG or OBJ FIFO. | ||
|
||
The selection follows the following rules: | ||
1. **In CGB Mode**, if [`LCDC` bit 0 (priority enable)](<#CGB Mode: BG and Window master priority>) is reset, pick the BG pixel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule appears to be simply disabling OBJ, which duplicates the functionality of rule 3 and doesn't seem correct. I believe a correct rule would be to ignore rules 4 and 5.
15009a4
to
fe13a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit of a noob with this part of the Game Boy, which is hopefully a good thing for someone reviewing this!
I've made a bunch of inline comments/suggestions that are all mostly writing/language based because I can't really correct technical points with much confidence.
Instead, I'll cover some general points and my issues with understanding the technical side here:
I can't get past "Pixel Slice Fetcher/Get Tile ID".
- the Get Tile ID + BG + OBJ fetcher nested headings are confusing
- The Pixel Slice Fetcher doesn't even do anything until Get Tile ID has been done?
- we're getting the Tile ID but we're also getting attributes and doing ... something with them?
- after the BG and OBJ fetchers have had a go, the Pixel Slice Fetcher does the actual pixel slice fetching -- which is only the Color ID part of the the tile?
Some of the points about FIFO refill priority and the related potential delays are repeated several times in slightly different ways. This makes the differences seem like they should be very meaningful but I can't discern what the meaning is.
The BG FIFO, BG Fetcher, BG Fetcher with Window Fetcher hat, OBJ FIFO, OBJ Fetcher terminology makes it really difficult to follow just because they all look/sound like the same words. Sometimes it seems like the two fetchers are part of their respective FIFO or vice versa.
I think I want a more clear distinction between procedural, architectural, conceptual information.
To be clear, the information mostly seems to be here, but the presentation sometimes jumps between these modes and I struggle to follow that. I'm sure others have both more and less difficulty with that than I do.
This is a really complicated thing you're trying to explain!
|
||
Once the last pixel has been output, the PPU releases the VRAM bus, and does nothing while it waits for the scanline to end. | ||
|
||
The PPU embarks both a vertical counter (exposed as [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>)), *and* a horizontal counter, which will be referred to as "`LX`" henceforth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"embarks" doesn't seem to be the right word to use here.
There seems to be only one place where LX
is used (#### BG Fetcher) so I think this should move there. I added my suggested change there.
The PPU embarks both a vertical counter (exposed as [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>)), *and* a horizontal counter, which will be referred to as "`LX`" henceforth. |
</tr> | ||
</tbody> | ||
</table> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain LX here instead of above.
Maybe I misunderstood the purpose, but I think this makes the point more clearly:
:::tip | |
`LX` refers to the PPU's horizontal counter, or *LCD X coordinate*, but it isn't the name of a register like its vertical counterpart [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>). | |
::: | |
@@ -0,0 +1,408 @@ | |||
# Rendering Internals | |||
|
|||
The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"(= the screen)" is very informal compared to the rest of the text. I don't think you need to explain what the LCD is / is for.
The Game Boy's PPU is the component responsible for feeding the LCD (= the screen) with pixels. | |
The Game Boy's PPU is the component responsible for feeding the LCD with pixels. |
::: tip Terminology | ||
|
||
A "dot" is the unit of time within the PPU. | ||
One "dot" is one 4 MiHz cycle, i.e. a unit of time equal to 1 ∕ 4194304 of a second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to emphasise dot every time after introducing it. (There's many of these, not just this one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree here
|
||
#### BG fetcher | ||
|
||
During this step, a tilemap is sampled to determine which tile to fetch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This under the BG fetcher heading sounds like the Bg fetcher is the step.
During this step, a tilemap is sampled to determine which tile to fetch. | |
When the BG fetcher is active, a tilemap is sampled to determine which tile to fetch. |
|
||
### Get tile ID | ||
|
||
This step determines which background/window tile to fetch pixels from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step determines which background/window tile to fetch pixels from. | |
This step determines which tile to fetch pixels from. |
|
||
::: | ||
|
||
A byte is read from the computed address, and is forwarded to the Pixel Slice Fetcher as a tile ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A byte is read from the computed address, and is forwarded to the Pixel Slice Fetcher as a tile ID. | |
A tile ID is read from the computed address, and is forwarded to the Pixel Slice Fetcher. |
::: tip Raster effects | ||
|
||
Interestingly, unlike e.g. the NES' PPU, great care has been taken to ensure that the BG fetcher re-reads as many registers as possible (`SCY`, `LCDC`, etc.). | ||
This may have been insight from the former console, on which [proper "raster splits" are quite tricky](https://www.nesdev.org/wiki/PPU_scrolling#Split_X_scroll) due to a lot of internal caching. | ||
|
||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting but might be a bit out of scope?
It sounds like trivia, due to the framing with the NES PPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe this may be a bit out of scope right there, especially the second "speculation" part.
|
||
::: | ||
|
||
This step takes 2 dots, with the VRAM access(es) being performed on the second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this step" again -- is this different to the OBJ fetcher step?
fe13a13
to
09c063f
Compare
Also avoid describing SameBoy internals, instead relying on it when otherwise corroborated, or on schematics and/or test ROMs when possible. Restructure the article to describe behavior more than components, especially in a way that is more friendly to someone not knowing what all the components are about. Add a diagram, too, and move the mode timing diagram to the STAT article, where it belongs just as well, but where it will be more visible and thus more useful.
09c063f
to
53c0f1d
Compare
1. OAM is scanned for Y positions in range (based on [`LY`](<#FF44 — LY: LCD Y coordinate \[read-only\]>) and [`LCDC`](<#FF40 — LCDC: LCD control>)); the X coordinate is not checked! | ||
2. The first 10 matches get their X and Y coordinates stored in an "OBJ slot". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this different depending on whether the software is running on a DMG (including CGB with OPRI set) or a CGB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DMG and CGB have identical OAM scan behavior (only the Y coordinate is checked, and the first 10 in OAM order are matched) and OBJ fetch order (lower X first, or lower OAM index for the same X), the actual difference is how OBJ pixels are mixed in the FIFO.
DMG (including CGB with OPRI set) prioritizes all opaque pixels already in the OBJ FIFO when fetching a new OBJ, which means the fetch order (lower X etc.) determines the priority.
CGB prioritizes only the opaque pixels with a smaller OAM index than the new OBJ, which means the FIFO tracks the OAM index per pixel (or possibly the OBJ slot index, since those are in increasing OAM order and would allow cheaper FIFO storage and comparison hardware).
Co-authored-by: Estus <git@estus.dev>
Co-authored-by: Quinn <3379314+quinnyo@users.noreply.github.com>
Also avoid describing SameBoy internals, instead relying on it when otherwise corroborated, or on schematics and/or test ROMs when possible.
Restructure the article to describe behavior more than components, especially in a way that is more friendly to someone not knowing what all the components are about.
Add a diagram, too, and move the mode timing diagram to the STAT article, where it belongs just as well, but where it will be more visible and thus more useful.
Fixes #377, fixes #408.