Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#7287: Decouple Cache Replacement Policies. #7314

Open
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

oralmer
Copy link
Contributor

@oralmer oralmer commented Mar 2, 2025

Add cache_replacement_policy_t - An interface that allows defining a cache replacement policy used by caching_device_t

We now no longer need to re-implement a cache for each replacement policy, instead passing a unique_ptr to a policy. The policies themselves can implement the replacement procedures with the appropriate data structures. Any user can add his own replacement policies and pass them to any caching device.

Fixed all relevant uses and tests.

Removed all subclasses re-implementing policies.

Had to implement a small functionality change in fifo - Previously, the fifo cache policy in cache_fifo_t ignored invalidations, but now that invalidations are handled by the caching device and replacement by the policy, we can't completely ignore them. There don't seem to be any uses that depend on it, and I think this design makes more sense, but it does constitute a functionality change. Should we make an effort to support it? We can move the valid bits into the policy, for instance, but I personally think this is cleaner - Now it actually implements FIFO, rather than round robin.

There is a cache policy test in unit_test_cache_replacement_policy, but I can add another that looks directly at the cache policies if you prefer.

Note that some files show as "renamed" - They are actually deleted files, and new cache policies with similar names, and this confuses git.

Fixes #7287

Changed `block_size` to `long int` in order to allow larger pages in
TLB. Also added two minor fixes that came up while working on the
feature:
 - Argument name mismatches in `cache_lru_t` and `cache_fifo_t`
between the header and body.
 - Clearer error messages in `tlb_simulator_t`, so you can know
exactly what failed to build.

Fixes #7291
Samll fixes which were not enough to justify a seperate issue.
 - Fixed inconsistent argument names in cache_fifo and cach_lru - header
   vs implementation.
 - Improved error printing in tlb_t so you can know what actually failed
 - Added and `add_parent` method to `caching_device_t`
Add `cache_replacement_policy_t` - An interface that allows
defining a cache replacement policy used by `caching_device_t`

We now no longer need to re-implement a cache for each replacement policy,
instead passing a `unique_ptr` to a policy. The policies themselves can
implement the replacement procedures with the appropriate data
structures. Any user can add his own replacement policies and pass them
to any caching device.

Fixed all relevant uses and tests.

Removed all subclasses re-implementing policies.

Had to implement a small functionality change in `fifo` - Previously, the fifo cache
policy in `cache_fifo_t` ignored invalidations, but now that invalidations are handled by
the caching device and replacement by the policy, we can't completely
ignore them. There don't seem to be any uses that depend on it, and I
think this design makes more sense, but it does constitute a functionality
change. Should we make an effort to support it? We can move the valid
bits into the policy, for instance, but I personally think this is
cleaner - Now it actually implements FIFO, rather than round robin.

There is a cache policy test in `unit_test_cache_replacement_policy`,
but I can add another that looks directly at the cache policies if you
prefer.

Fixes #7287
@oralmer
Copy link
Contributor Author

oralmer commented Mar 2, 2025

@derekbruening I guess this begets an update to the release doc? How do you think it should look?

Copy link
Contributor

@brettcoon brettcoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get through all the files yet; I'll review more tomorrow.

@derekbruening
Copy link
Contributor

Going to defer to @brettcoon for the review here.

min_counter = get_caching_device_block(block_idx, way).counter_;
min_way = way;
}
if (get_caching_device_block(block_idx, way).tag_ == TAG_INVALID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code suggests cache_replacement_policy implementations can rely on all tags being valid. If that guarantee is intended, can you add a comment stating this in the cache_replacement_policy.h comments for get_next_way_to_replace()?

void
policy_bit_plru_t::eviction_update(int block_idx, int way)
{
// Nothing to update, when the way is accessed we will update it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rely on caching_device.cpp handling way selection when there are invalid lines? So when a line is evicted, the policy doesn't hear about it again until the replacement has already been loaded? It feels like the PLRU state for this way could influence a subsequent update of another way in this set.
I'm not entirely sure about that, but minimally I think a comment explaining why PLRU state doesn't need to be reset when a block is evicted would be very helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comments, currently I assume that if we reached the policy then all ways are valid - caching device returns the first invalid way without consulting the policy otherwise. should we do something else? is it reasonable for the policy to be aware of valid\invalid ways?

Copy link
Contributor

@brettcoon brettcoon Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the policy might need to be aware of valid/invalid ways.
The thing I'm wrestling with is the difference between the initial policy state when all lines are invalid (from the constructor), versus the policy state after all lines have been invalidated after previously being valid, as well as any possible cache modifications that aren't communicated to policy. It seems that if eviction_update() doesn't need to do anything, then the policy state also should not require initialization. I guess in the LFU policy the eviction_update does update the state, so in that case the invalid-line state is consistent, so long as all invalidations call eviction_update().
I'll put a comment in the base file, but I think what I'm really looking for is a clearer statement of the contract between caching_device and cache_replacement_policy_t.

cache_fifo_test.access_and_check(addr_vec[ADDR_G], 2); // A G x H
cache_fifo_test.access_and_check(addr_vec[ADDR_B], 3); // A G B h
cache_fifo_test.access_and_check(addr_vec[ADDR_G], 1); // A f G H
cache_fifo_test.access_and_check(addr_vec[ADDR_B], 3); // A B G h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the new FIFO behaves differently than the old FIFO?

Copy link
Contributor Author

@oralmer oralmer Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I mention this in the CR description - The old fifo was a round-robin that was unaware of whether cache lines are valid or not. Now that the policy is split from the caching device, and the device handles invalid lines, this can't be implemented.

we can pass a vector of states per way or something along those lines to allow replacement policies to handle such cases, but this feels a bit odd to me - Why would you ever want to evict a valid line when there's an invalid one available for use? I lack domain knowledge here though, so LMK if this should change.

Base automatically changed from i7291-tlb-page-sizes to master March 4, 2025 07:48
@oralmer oralmer requested a review from brettcoon March 5, 2025 10:05
*
* Holds the necessary information to implement a cache replacement policy,
* and provides a replacement-specific get_next_way_to_replace() method for
* `caching_device_t`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand this comment with a more precise statement of how cache_replacement_polity_t is used, and the "contract" between the caching_device and this policy?
Specifically, please clarify:

  • cache_replacement_polity_t is notified of all cache accesses and evictions via access_update() and eviction_update(), respectively. (Is this accurate? Looking more, it seems the caching_device.cpp code may not forward invalidate() to the policy, which begs the question of what is the difference between invalidation and eviction, and does it matter?)
  • cache_replacement_policy_t::get_next_way_to_replace() is called only when all cache ways are valid and an eviction is required: if there are any invalid ways, caching_device picks which one to fill.

I believe this means that in the case of a new line being inserted in the cache, the only call into cache_replacement_policy will be to access_update() for the new line. But when a new line needs to replace an existing line, caching_device will make three calls into cache_replacement_policy: (1) get_next_way_to_replace to choose the line to evict, then (2) eviction_update, and then (3) access_update. Is that accurate?

Since I think the FIFO policy is the only one implemented that does something special for evictions, it's probably the only policy sensitive to invalidations right now.

Copy link
Contributor Author

@oralmer oralmer Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded the comment, and now pass valid_ways to get_next_way_to_replace. This allows us to keep the same policy logic as before, without having duplicate data between the device and policy. Now, the device is in charge of tracking validity, and the policy just gets the state of the ways. Evictions and invalidations are still separate concepts, and the policy can handle each one differently, with evictions calling eviction_update and invalidations changing the valid_ways vector.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the updated comment is much clearer.

As for adding valid_ways to get_next_way_to_replace, I'm a little concerned about the performance cost of computing the valid mask before every call to get_next_way_to_replace(), especially since the ways are usually all valid and most policies don't need that information anyway. with this change, we now require two loops over the ways for every replacement when everything is valid.

Instead, I suggest going back to letting caching_device_t handle filling invalid ways, but adding an "invalidation_update()" method to this policy interface, and having caching_device_t call it whenever a line is invalidated. If the policy cares, it can track and/or update state as needed to reflect the invalidation. If the policy doesn't care about invalidations, we're only adding a bit of overhead to invalidation events. And I also think it's more consistent if the policy is notified of all cache state transactions, including invalidations.

I suspect most (all?) policies will treat eviction_update() exactly the same as invalidation_update(), so it's not actually clear a separate method is needed. I believe the only difference between eviction and invalidation is whether or not the line is going to be immediately replaced, suggesting the easiest thing might be to just use a single method for both; whether it's called eviction_update() or invalidation_update() doesn't matter to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added validation tracking to the replacement policy

This also forced us to add a validation_update function - since when using snoop with exclusive caches the cache entries can be moved between caches without generating evictions or accesses.

I think this is a bit more bug prone - Now you can reach an illegal state, where the tag is invalid but the policy thinks it it valid, or vice versa.

@oralmer oralmer requested a review from brettcoon March 9, 2025 15:38
*
* Holds the necessary information to implement a cache replacement policy,
* and provides a replacement-specific get_next_way_to_replace() method for
* `caching_device_t`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the updated comment is much clearer.

As for adding valid_ways to get_next_way_to_replace, I'm a little concerned about the performance cost of computing the valid mask before every call to get_next_way_to_replace(), especially since the ways are usually all valid and most policies don't need that information anyway. with this change, we now require two loops over the ways for every replacement when everything is valid.

Instead, I suggest going back to letting caching_device_t handle filling invalid ways, but adding an "invalidation_update()" method to this policy interface, and having caching_device_t call it whenever a line is invalidated. If the policy cares, it can track and/or update state as needed to reflect the invalidation. If the policy doesn't care about invalidations, we're only adding a bit of overhead to invalidation events. And I also think it's more consistent if the policy is notified of all cache state transactions, including invalidations.

I suspect most (all?) policies will treat eviction_update() exactly the same as invalidation_update(), so it's not actually clear a separate method is needed. I believe the only difference between eviction and invalidation is whether or not the line is going to be immediately replaced, suggesting the easiest thing might be to just use a single method for both; whether it's called eviction_update() or invalidation_update() doesn't matter to me.

* - When a way is evicted, `eviction_update()` is called on the evicted way, and
* `access_update()` is called on the new way immediately after.
* - When a way is invalidated, nothing is done - caching_device_t should keep track of
* which ways are valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, but as I suggest above, I think it would be even better to notify the policy of invalidations by also calling eviction_update() or invalidation_update() for these events.

// in a contiguous array, so we divide by the associativity to get the block
// index.
return block_idx / associativity_;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have noticed this before, but this method calls both its input and its output "block index", so it's not clear what "block index" means. The comment effectively says "we divide block index by associativity to get block index".
Since "block_idx" is used throughout caching_device_t to refer to the array index for a pointer to the first block in a set, we need a different name for what this is computing. I suggest something like "set_index".

In fact, it would be best if cache policy knew nothing about the underlying cache array organization. I suggest having this API deal only with set_index instead of block_idx in the argument lists, and letting caching_device_t handle the computation of set_index. It looks like the caching_device code actually starts with set_index and then multiplies it by associativity to get block_idx, so having the policy API take in set_index directly might actually be more efficient as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to line index

Regarding the change - I considered this, but felt it makes the code in caching_device more complex and error prone without saving too much work for the policy - Now you'll have to indices to manage in the caching device, while in the policy you only have one (though you do need to make sure to compute it)

int first_invalid_way = get_first_invalid_way(valid_ways);
if (first_invalid_way != -1) {
return first_invalid_way;
}
block_idx = get_block_index(block_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, the dual meanings of "block_idx" are confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants