riscv · bcstrongx · Oct 18, 2025 · Sep 4, 2025 · Sep 18, 2025 · Sep 22, 2025
diff --git a/src/body.adoc b/src/body.adoc
@@ -4,6 +4,21 @@
 
 The Sspesa extension defines a mechanism by which the hart will, upon Zihpm counter overflow, record a PC and metadata associated with the instruction to which the sampled event should be attributed.
 
+=== Sspesa Profile Types and Sources of Error
+
+Sspesa enables generating two types of performance profiles:
+
+* *Event-based performance profiles* in which the weight of an instruction in the profile is approximately proportional to the number of times it was subjected to the event. Event-based profiles are most useful when tracking down the instructions responsible for a problematic performance event. For example, if an application suffers from many L1 data cache misses, a sorted event-based profile obtained for the data L1 cache miss performance event provides an ordered list of the instructions that are responsible for most L1 data cache misses.
+* *Time-based performance profiles* in which the weight of an instruction in the profile is approximately proportional to its contribution to overall execution time. To obtain such a profile, Sspesa is configured to trigger on a cycle counter. Upon a cycle counter overflow, Sspesa records the PC of an instruction from which profiling software can approximate the instruction(s) that the core exposed the exection time of when the sample was taken. A sorted time-based profile orders an application's instructions according to their approximate impact on overall execution time, and is hence most useful as a first-pass approach for identifying application hot spots.
+
+NOTE: _In contrast to Sspesa, Smpdis/Sspdis (see <<_precise_decoded_instruction_sampling_smpdissspdis>>) create profiles in which an instruction is represented proportionally to its execution count._
+
+Performance profiles obtained through Sspesa are approximate for two reasons. The first is systematic error (or bias) which reduces profile accuracy by attributing a counter overflow to a different instruction than the one that caused it. For event-based profiles, implementations are required to support precise attribution (i.e., no bias) because this is the core purpose of the Sspesa extension. For example, upon an overflow in the L1 data cache miss event counter, an Sspesa implementation must report the PC of the instruction that caused the counter overflow.
+
+Time-based profiles are more complicated because no specific instruction caused the cycle counter to overflow. A time-based profile is however most useful when each instruction is represented in the profile in proportion to its impact on overall execution time. To create a bias-free profile, the implementation must therefore attribute each sample to the address(es) of the instruction(s) that the core is exposing execution time of (i.e., retiring) in the cycle the counter overflows; this is known as time-proportional attribution. Systematic errors can hence be eliminated by adopting time-proportional attribution policies, but this may not always be desirable (e.g., due to implementation overheads). For this reason, Sspesa expects implementations to minimize attribution bias when creating time-based profiles.
+
+Statistical (or random) error is the second reason why Sspesa profiles are approximate. The root cause of statistical error is that Sspesa samples the its CSRs. It is thus unavoidable and affect event-based and time-based profiles equally. Statistical error is however typically negligible when sampling at 4kHz (which is the default sampling frequency of perf). Unlike systematic error, statistical error goes down as sampling frequency is increased. If statistical error is percieved to be a problem, it can be mitigated by adopting a higher sampling frequency, but it comes at the cost of increased runtime overhead.
+
 === CSRs
 
 ==== Hardware Performance Monitor Sample PC Register (`shpmspc`)
@@ -58,6 +73,8 @@ Access to `shpmsdata` matches that of `shpmspc` above, and `shpmsdata` captures
 _In modern, superscalar implementations, the microarchitecture may be optimized such that the full PC of each retired instruction is not maintained throughout the pipeline.  The `shpmsdata` register provides a standard means by which such implementations can provide precise attribution, using a reference PC (`shpmspc`) and custom metadata that can be used by implementation-specific software algorithms to discern the appropriate sample PC._
 ====
 
+NOTE: _When creating time-based profiles, the value in `shpmspc` can be combined with implementation-specific metadata in `shpmsdata` and Control Transfer Records (Ssctr) to account for instruction parallelism by obtaining the addresses of all instructions that retired in the cycle the sample was taken. If Ssctr is not available, implementations should strive to avoid bias when selecting the value for `shpmspc`._
+
 WARNING: _Should we say that software should default to using `shpmspc` as-is, if it does not know of any custom algorithm for using `shpmsdata`?  Though that could result in the appearance of support for precise-attribution that in fact is not precise.  Perhaps if `shpmsdata` is not hardcoded to 0 and no custom algorithm is reported then software shouldn't report support for precise attribution?_
 
 == Precise Local Counter Overflow Interrupt ISA Extension (Ssplcofi)