Skip to content

Commit 7644f2d

Browse files
committed
writeup: Add info on benchmarks
1 parent 3360c16 commit 7644f2d

File tree

3 files changed

+74
-3
lines changed

3 files changed

+74
-3
lines changed

benchmarks/gen-stupid.nu

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ let v0 = $"T0 { cyclic: NoneT0 {} }"
1616
print (1..<$numTypes | reduce -f $v0 {|t, acc| $"T($t) { cyclic: NoneT($t) {}, f: ($acc) }" })
1717

1818
print $"fn triggerDecrs\(v: T($numTypes - 1)): int ="
19-
for type in 0..<$numTypes {
20-
let ref = 0..<$type | each { ".f" } | str join
21-
print $"\(let temp($type) = v($ref) in 0);"
19+
for nesting in 0..<$numTypes {
20+
let ref = 0..<$nesting | each { ".f" } | str join
21+
print $"\(let temp($nesting) = v($ref) in 0);"
2222
}
2323
print "0"
2424

writeup/writeup.pdf

173 KB
Binary file not shown.

writeup/writeup.typ

+71
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,77 @@ As a bonus, grouping objects by SCC and processing them separately also lets us
140140

141141
#smallcaps[Fred] was created to try out this algorithm. The implementation can be found at https://github.com/ysthakur/Fred. The language uses automatic reference counting and is compiled to C. Partly because it is compiled to C and partly because I made it, it involves copious amounts of jank. When I have time after finals, I will try to get rid of some of this awfulness, as well as document my code better, but in the meantime, #smallcaps[Fred] is mostly functional (functional as in alcoholic).
142142

143+
= Benchmarks
144+
145+
I would like to preface this section by noting that it is complete bogus and that you can safely skip it. These benchmarks should not be taken as evidence of anything. Apologies in advance for that. Nevertheless, I have used these benchmarks to convince myself that my algorithm is vastly superior to the base lazy mark scan algorithm. Feel free to do the same.
146+
147+
I have two benchmarks at the moment. They can be found in the #link("https://github.com/ysthakur/fred/tree/main/benchmarks")[`benchmarks`] folder.
148+
149+
The benchmarks work by running a piece of code a bunch of times, then looking at how much the processor's timestamp counter increased (using `rdtscp`) as well as the processor time (using `clock()`). Since within each benchmark, the code being timed is run lots of times, I only recorded the times after running each benchmark program once, rather than running each program multiple times and noting the mean and range.
150+
151+
== `game.fred`
152+
153+
#link("https://github.com/ysthakur/fred/blob/main/benchmarks/game.fred")[Here's the code]. This program is supposed to be a game, except it does basically nothing. It demonstrates a case where normal lazy mark scan will unnecessarily scan a bunch of objects, but my algorithm won't.
154+
155+
It has the following types:
156+
```kotlin
157+
data Player
158+
= Player { store: Store }
159+
// This exists only to make the compiler think that Player can be involved in cycles
160+
| PlayerCyclic {
161+
mut player: Player
162+
}
163+
data Store = Store { datums: Data }
164+
data Data
165+
= DataCons {
166+
value: int,
167+
// This is mut only so the compiler thinks there can be a cycle at runtime
168+
mut next: Data
169+
}
170+
| DataNil {}
171+
```
172+
173+
`Store` represents some kind of shared state or resources or something that all `Player` objects have a reference to. This sort of thing is probably more common in Java than in a functional language, but whatever.
174+
175+
This is what `game.fred` does:
176+
1. Create a ginormous `Store` object
177+
2. Do the following 50,000 times:
178+
1. Create a `Player` object
179+
2. Increment and decrement its refcount so that it's added to the list of PCRs
180+
3. Invoke `processAllPCRs()`
181+
182+
The `processAllPCRs()` call above will cause the `Player` object to be scanned. When it's scanned, with my algorithm, the `Store` object won't be scanned, because it's in a separate SCC. But with base lazy mark scan, the `Store` object will have to be scanned, so it will be slower.
183+
184+
Here are the results:
185+
#table(
186+
columns: (auto, auto, auto),
187+
table.header([Lazy mark scan only?], [Timestamp counter], [Clock (s)]),
188+
[No], [74647376], [0.028586],
189+
[Yes], [29478684752], [11.289244]
190+
)
191+
192+
I'd go into how my algorithm is orders of magnitude faster than base lazy mark scan, but this benchmark means basically nothing. The only thing it really tells you is that there can be cases where my algorithm is faster than lazy mark scan, but even calculations on a blackboard would've told you that. This benchmark doesn't help one get a sense of how much faster my algorithm would be in general.
193+
194+
== `stupid.fred`
195+
196+
If the previous benchmark wasn't artificial enough for you, this one definitely will be. I wanted to come up with something where my algorithm would perform worse than base lazy mark scan. This can happen if the overhead from inserting PCRs into the right bucket (sorted) is too high. You need to have a bunch of SCCs, and you need to often have objects from higher SCCs being added to the list of PCRs after objects from lower SCCs.
197+
198+
This is actually a situation that probably isn't uncommon in real codebases. If you have some long-lived object that's passed around everywhere, you probably have references to it being created all the time. I do believe escape analysis would help with/fix many, if not most of those cases, though. Removing a PCR every time its refcount is incremented could also help here, although that has tradeoffs.
199+
200+
I, unfortunately, couldn't come up with a decent example, so I wrote a script to do it for me. The script first generates 200 types. Each type $T_(i+1)$ has a field of type $T_i$. The script then generates an object of type $T_199$. Then it goes from $T_199$ down to $T_0$, adding objects to the list of PCRs. With base lazy mark scan, adding PCRs is a constant time operation, but with my algorithm, it's linear time, since an object of type $T_i$ here would have to go through $199 - i$ objects first.
201+
202+
All of the stuff described above is then run 50,000 times. Here are the results:
203+
#table(
204+
columns: (auto, auto, auto),
205+
table.header([Lazy mark scan only?], [Timestamp counter], [Clock (s)]),
206+
[No], [27741037106], [10.623692],
207+
[Yes], [11054113602], [4.233204]
208+
)
209+
210+
Again, all this tells you is that there are some cases where my algorithm can do worse than lazy mark scan.
211+
212+
= Conclusion
213+
143214
= Future work
144215

145216
#bibliography("writeup-bib.bib")

0 commit comments

Comments
 (0)