i#7287: Decouple Cache Replacement Policies. #7314

oralmer · 2025-03-02T13:15:54Z

Add cache_replacement_policy_t - An interface that allows defining a cache replacement policy used by caching_device_t

We now no longer need to re-implement a cache for each replacement policy, instead passing a unique_ptr to a policy. The policies themselves can implement the replacement procedures with the appropriate data structures. Any user can add his own replacement policies and pass them to any caching device.

Fixed all relevant uses and tests.

Removed all subclasses re-implementing policies.

Had to implement a small functionality change in fifo - Previously, the fifo cache policy in cache_fifo_t ignored invalidations, but now that invalidations are handled by the caching device and replacement by the policy, we can't completely ignore them. There don't seem to be any uses that depend on it, and I think this design makes more sense, but it does constitute a functionality change. Should we make an effort to support it? We can move the valid bits into the policy, for instance, but I personally think this is cleaner - Now it actually implements FIFO, rather than round robin.

There is a cache policy test in unit_test_cache_replacement_policy, but I can add another that looks directly at the cache policies if you prefer.

Note that some files show as "renamed" - They are actually deleted files, and new cache policies with similar names, and this confuses git.

Fixes #7287

Changed `block_size` to `long int` in order to allow larger pages in TLB. Also added two minor fixes that came up while working on the feature: - Argument name mismatches in `cache_lru_t` and `cache_fifo_t` between the header and body. - Clearer error messages in `tlb_simulator_t`, so you can know exactly what failed to build. Fixes #7291

Samll fixes which were not enough to justify a seperate issue. - Fixed inconsistent argument names in cache_fifo and cach_lru - header vs implementation. - Improved error printing in tlb_t so you can know what actually failed - Added and `add_parent` method to `caching_device_t`

Add `cache_replacement_policy_t` - An interface that allows defining a cache replacement policy used by `caching_device_t` We now no longer need to re-implement a cache for each replacement policy, instead passing a `unique_ptr` to a policy. The policies themselves can implement the replacement procedures with the appropriate data structures. Any user can add his own replacement policies and pass them to any caching device. Fixed all relevant uses and tests. Removed all subclasses re-implementing policies. Had to implement a small functionality change in `fifo` - Previously, the fifo cache policy in `cache_fifo_t` ignored invalidations, but now that invalidations are handled by the caching device and replacement by the policy, we can't completely ignore them. There don't seem to be any uses that depend on it, and I think this design makes more sense, but it does constitute a functionality change. Should we make an effort to support it? We can move the valid bits into the policy, for instance, but I personally think this is cleaner - Now it actually implements FIFO, rather than round robin. There is a cache policy test in `unit_test_cache_replacement_policy`, but I can add another that looks directly at the cache policies if you prefer. Fixes #7287

oralmer · 2025-03-02T15:07:22Z

@derekbruening I guess this begets an update to the release doc? How do you think it should look?

brettcoon

I didn't get through all the files yet; I'll review more tomorrow.

clients/drcachesim/simulator/cache_replacement_policy.h

clients/drcachesim/simulator/bit_plru.h

derekbruening · 2025-03-03T21:26:11Z

Going to defer to @brettcoon for the review here.

clients/drcachesim/simulator/cache.h

brettcoon · 2025-03-04T06:41:52Z

clients/drcachesim/simulator/caching_device.cpp

-            min_counter = get_caching_device_block(block_idx, way).counter_;
-            min_way = way;
-        }
+        if (get_caching_device_block(block_idx, way).tag_ == TAG_INVALID)


This code suggests cache_replacement_policy implementations can rely on all tags being valid. If that guarantee is intended, can you add a comment stating this in the cache_replacement_policy.h comments for get_next_way_to_replace()?

clients/drcachesim/simulator/caching_device.h

clients/drcachesim/simulator/policy_bit_plru.h

brettcoon · 2025-03-04T07:08:36Z

clients/drcachesim/simulator/policy_bit_plru.cpp

+void
+policy_bit_plru_t::eviction_update(int block_idx, int way)
+{
+    // Nothing to update, when the way is accessed we will update it.


Does this rely on caching_device.cpp handling way selection when there are invalid lines? So when a line is evicted, the policy doesn't hear about it again until the replacement has already been loaded? It feels like the PLRU state for this way could influence a subsequent update of another way in this set.
I'm not entirely sure about that, but minimally I think a comment explaining why PLRU state doesn't need to be reset when a block is evicted would be very helpful.

see other comments, currently I assume that if we reached the policy then all ways are valid - caching device returns the first invalid way without consulting the policy otherwise. should we do something else? is it reasonable for the policy to be aware of valid\invalid ways?

It seems like the policy might need to be aware of valid/invalid ways.
The thing I'm wrestling with is the difference between the initial policy state when all lines are invalid (from the constructor), versus the policy state after all lines have been invalidated after previously being valid, as well as any possible cache modifications that aren't communicated to policy. It seems that if eviction_update() doesn't need to do anything, then the policy state also should not require initialization. I guess in the LFU policy the eviction_update does update the state, so in that case the invalid-line state is consistent, so long as all invalidations call eviction_update().
I'll put a comment in the base file, but I think what I'm really looking for is a clearer statement of the contract between caching_device and cache_replacement_policy_t.

clients/drcachesim/simulator/policy_bit_plru.h

clients/drcachesim/simulator/policy_lru.cpp

clients/drcachesim/simulator/tlb_simulator.cpp

brettcoon · 2025-03-04T07:34:21Z

clients/drcachesim/tests/cache_replacement_policy_unit_test.cpp

-    cache_fifo_test.access_and_check(addr_vec[ADDR_G], 2); //     A G x H
-    cache_fifo_test.access_and_check(addr_vec[ADDR_B], 3); //     A G B h
+    cache_fifo_test.access_and_check(addr_vec[ADDR_G], 1); //     A f G H
+    cache_fifo_test.access_and_check(addr_vec[ADDR_B], 3); //     A B G h


So the new FIFO behaves differently than the old FIFO?

Yea, I mention this in the CR description - The old fifo was a round-robin that was unaware of whether cache lines are valid or not. Now that the policy is split from the caching device, and the device handles invalid lines, this can't be implemented.

we can pass a vector of states per way or something along those lines to allow replacement policies to handle such cases, but this feels a bit odd to me - Why would you ever want to evict a valid line when there's an invalid one available for use? I lack domain knowledge here though, so LMK if this should change.

clients/drcachesim/simulator/cache.h

brettcoon · 2025-03-08T01:09:58Z

clients/drcachesim/simulator/cache_replacement_policy.h

+ *
+ * Holds the necessary information to implement a cache replacement policy,
+ * and provides a replacement-specific get_next_way_to_replace() method for
+ * `caching_device_t`.


Can you expand this comment with a more precise statement of how cache_replacement_polity_t is used, and the "contract" between the caching_device and this policy?
Specifically, please clarify:

cache_replacement_polity_t is notified of all cache accesses and evictions via access_update() and eviction_update(), respectively. (Is this accurate? Looking more, it seems the caching_device.cpp code may not forward invalidate() to the policy, which begs the question of what is the difference between invalidation and eviction, and does it matter?)

cache_replacement_policy_t::get_next_way_to_replace() is called only when all cache ways are valid and an eviction is required: if there are any invalid ways, caching_device picks which one to fill.

I believe this means that in the case of a new line being inserted in the cache, the only call into cache_replacement_policy will be to access_update() for the new line. But when a new line needs to replace an existing line, caching_device will make three calls into cache_replacement_policy: (1) get_next_way_to_replace to choose the line to evict, then (2) eviction_update, and then (3) access_update. Is that accurate?

Since I think the FIFO policy is the only one implemented that does something special for evictions, it's probably the only policy sensitive to invalidations right now.

I expanded the comment, and now pass valid_ways to get_next_way_to_replace. This allows us to keep the same policy logic as before, without having duplicate data between the device and policy. Now, the device is in charge of tracking validity, and the policy just gets the state of the ways. Evictions and invalidations are still separate concepts, and the policy can handle each one differently, with evictions calling eviction_update and invalidations changing the valid_ways vector.

WDYT?

Thanks, the updated comment is much clearer.

As for adding valid_ways to get_next_way_to_replace, I'm a little concerned about the performance cost of computing the valid mask before every call to get_next_way_to_replace(), especially since the ways are usually all valid and most policies don't need that information anyway. with this change, we now require two loops over the ways for every replacement when everything is valid.

Instead, I suggest going back to letting caching_device_t handle filling invalid ways, but adding an "invalidation_update()" method to this policy interface, and having caching_device_t call it whenever a line is invalidated. If the policy cares, it can track and/or update state as needed to reflect the invalidation. If the policy doesn't care about invalidations, we're only adding a bit of overhead to invalidation events. And I also think it's more consistent if the policy is notified of all cache state transactions, including invalidations.

I suspect most (all?) policies will treat eviction_update() exactly the same as invalidation_update(), so it's not actually clear a separate method is needed. I believe the only difference between eviction and invalidation is whether or not the line is going to be immediately replaced, suggesting the easiest thing might be to just use a single method for both; whether it's called eviction_update() or invalidation_update() doesn't matter to me.

Added validation tracking to the replacement policy

This also forced us to add a validation_update function - since when using snoop with exclusive caches the cache entries can be moved between caches without generating evictions or accesses.

I think this is a bit more bug prone - Now you can reach an illegal state, where the tag is invalid but the policy thinks it it valid, or vice versa.

clients/drcachesim/simulator/tlb_simulator.cpp

clients/drcachesim/simulator/cache_replacement_policy.h

brettcoon · 2025-03-10T19:35:58Z

clients/drcachesim/simulator/cache_replacement_policy.h

+ *
+ * Holds the necessary information to implement a cache replacement policy,
+ * and provides a replacement-specific get_next_way_to_replace() method for
+ * `caching_device_t`.


Thanks, the updated comment is much clearer.

As for adding valid_ways to get_next_way_to_replace, I'm a little concerned about the performance cost of computing the valid mask before every call to get_next_way_to_replace(), especially since the ways are usually all valid and most policies don't need that information anyway. with this change, we now require two loops over the ways for every replacement when everything is valid.

Instead, I suggest going back to letting caching_device_t handle filling invalid ways, but adding an "invalidation_update()" method to this policy interface, and having caching_device_t call it whenever a line is invalidated. If the policy cares, it can track and/or update state as needed to reflect the invalidation. If the policy doesn't care about invalidations, we're only adding a bit of overhead to invalidation events. And I also think it's more consistent if the policy is notified of all cache state transactions, including invalidations.

I suspect most (all?) policies will treat eviction_update() exactly the same as invalidation_update(), so it's not actually clear a separate method is needed. I believe the only difference between eviction and invalidation is whether or not the line is going to be immediately replaced, suggesting the easiest thing might be to just use a single method for both; whether it's called eviction_update() or invalidation_update() doesn't matter to me.

brettcoon · 2025-03-10T19:37:46Z

clients/drcachesim/simulator/cache_replacement_policy.h

+ *  - When a way is evicted, `eviction_update()` is called on the evicted way, and
+ * `access_update()` is called on the new way immediately after.
+ *  - When a way is invalidated, nothing is done - caching_device_t should keep track of
+ * which ways are valid.


This is better, but as I suggest above, I think it would be even better to notify the policy of invalidations by also calling eviction_update() or invalidation_update() for these events.

brettcoon · 2025-03-10T22:35:37Z

clients/drcachesim/simulator/cache_replacement_policy.h

+        // in a contiguous array, so we divide by the associativity to get the block
+        // index.
+        return block_idx / associativity_;
+    }


I should have noticed this before, but this method calls both its input and its output "block index", so it's not clear what "block index" means. The comment effectively says "we divide block index by associativity to get block index".
Since "block_idx" is used throughout caching_device_t to refer to the array index for a pointer to the first block in a set, we need a different name for what this is computing. I suggest something like "set_index".

In fact, it would be best if cache policy knew nothing about the underlying cache array organization. I suggest having this API deal only with set_index instead of block_idx in the argument lists, and letting caching_device_t handle the computation of set_index. It looks like the caching_device code actually starts with set_index and then multiplies it by associativity to get block_idx, so having the policy API take in set_index directly might actually be more efficient as well.

Changed to line index

Regarding the change - I considered this, but felt it makes the code in caching_device more complex and error prone without saving too much work for the policy - Now you'll have to indices to manage in the caching device, while in the policy you only have one (though you do need to make sure to compute it)

brettcoon · 2025-03-10T22:52:04Z

clients/drcachesim/simulator/policy_bit_plru.cpp

+    int first_invalid_way = get_first_invalid_way(valid_ways);
+    if (first_invalid_way != -1) {
+        return first_invalid_way;
+    }
    block_idx = get_block_index(block_idx);


As mentioned above, the dual meanings of "block_idx" are confusing.

…ore appropriate lines

oralmer added 20 commits February 20, 2025 13:28

fix format in utils.h

a83b0bc

CR fixes - change to int64_t and add release dox

f33094e

fix base type for shift

bf265f7

fix format

145b0ec

Fixed return type - CI

2ef1000

Fixed return type - CI

42862d7

CR Fixes - grammer and link in release doc

bb2c486

remove link since this file cannot be accessed by doxygen

3ec1a36

merge i0-cleanups-and-fixes into i7291-tlb-page-sizes

2892ba7

CR - check int size

06adc52

Merged from master to fix alpine job

d53db1a

Self CR fixes

90afe0d

remove get parent

c613e38

merge from tlb-sizes

20952ff

formatting

36ff2d8

formatting

a61e251

formatting

4cec4df

oralmer requested review from derekbruening and brettcoon March 2, 2025 15:04

brettcoon reviewed Mar 3, 2025

View reviewed changes

oralmer added 6 commits March 3, 2025 06:54

Update total_size and num_lines types to int64_t

dce8dfe

Update return type int64_t

f977681

Update return type int64_t

e10e3e7

CR fixes - docs and rename to policy_*

a4b9328

CR fixes - docs and rename to policy_*

55f3223

CR fixes - docs and rename to policy_*

e923508

oralmer added 10 commits March 3, 2025 09:09

undo cast to size_t

0e09b82

undo cast to size_t

9168c9a

merge from tlb sizes

c6bce93

auto format

dd96e61

auto format

c33d782

auto format

fd0ce67

auto format

3274cb3

auto format

ef4a09d

Add set replace policy

d601acc

Add includes

37f5302

oralmer added 2 commits March 4, 2025 07:13

CR fixes - format and release doc

b108ea7

merge from tlb sizes

69bf150

brettcoon reviewed Mar 4, 2025

View reviewed changes

Base automatically changed from i7291-tlb-page-sizes to master March 4, 2025 07:48

oralmer added 2 commits March 4, 2025 08:53

CR fixes - sort includes, add comments, name changes

a7aa74d

merge master

f52c09e

oralmer requested a review from brettcoon March 5, 2025 10:05

brettcoon requested changes Mar 8, 2025

View reviewed changes

CR fixes - added valid bits vector

322b40c

oralmer requested a review from brettcoon March 9, 2025 15:38

oralmer added 5 commits March 9, 2025 15:40

Merge branch 'master' into i7287-extract-cache-policies

33a0440

Merge branch 'master' into i7287-extract-cache-policies

991a956

Fix test

0ff6300

Formatting

d049132

Fix imports

fd3aa95

brettcoon reviewed Mar 10, 2025

View reviewed changes

oralmer added 3 commits March 11, 2025 14:51

add in\validation tracking to the policy, and renamed blocks to the m…

2e6e9bc

…ore appropriate lines

format

cc03a23

spelling

d9b9cb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i#7287: Decouple Cache Replacement Policies. #7314

i#7287: Decouple Cache Replacement Policies. #7314

oralmer commented Mar 2, 2025 •

edited

Loading

oralmer commented Mar 2, 2025

brettcoon left a comment

derekbruening commented Mar 3, 2025

brettcoon Mar 4, 2025

brettcoon Mar 4, 2025

oralmer Mar 4, 2025

brettcoon Mar 8, 2025 •

edited

Loading

brettcoon Mar 4, 2025

oralmer Mar 4, 2025 •

edited

Loading

brettcoon Mar 8, 2025

oralmer Mar 9, 2025 •

edited

Loading

brettcoon Mar 10, 2025

oralmer Mar 11, 2025

brettcoon Mar 10, 2025

brettcoon Mar 10, 2025

brettcoon Mar 10, 2025

oralmer Mar 11, 2025

brettcoon Mar 10, 2025

i#7287: Decouple Cache Replacement Policies. #7314

Are you sure you want to change the base?

i#7287: Decouple Cache Replacement Policies. #7314

Conversation

oralmer commented Mar 2, 2025 • edited Loading

oralmer commented Mar 2, 2025

brettcoon left a comment

Choose a reason for hiding this comment

derekbruening commented Mar 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettcoon Mar 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oralmer Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oralmer Mar 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oralmer commented Mar 2, 2025 •

edited

Loading

brettcoon Mar 8, 2025 •

edited

Loading

oralmer Mar 4, 2025 •

edited

Loading

oralmer Mar 9, 2025 •

edited

Loading