CPS-0028? | Approaches to call-by-need in UPLC#1150
CPS-0028? | Approaches to call-by-need in UPLC#1150kozross wants to merge 6 commits intocardano-foundation:masterfrom
Conversation
There was a problem hiding this comment.
Marked Triage for next CIP meeting: https://hackmd.io/@cip-editors/128 ... tagging @effectfully @colll78 in the meantime to kick-start some review.
There was a problem hiding this comment.
@kozross much as posted in #1146 (comment), the CIP meeting consensus today was that your early-expressed goal of @kwxm (or equivalent, if there is one) providing an expert opinion about cost-related applicability of the CPS — without any practical objection over the suggested approach — would be required to proceed with this as a CPS candidate.
Also I thought I had mentioned this before but I had also tagged @effectfully above as someone who'd expressed deep interest & expertise in the details of Plutus semantics: a subject suggested by my own understanding of "laziness" in compilers. If there are additional or more approriate choices then please by all means tag them here.
|
The costing section mentions that "the cost of computing a 'true' lazy computation should only be 'paid' once" — but this only addresses the CPU side. Memoization is a time-space tradeoff: you're caching results to avoid recomputation. In UPLC's cost model, memory is budgeted separately from CPU. A few things worth thinking about here:
It feels like any proposal here should address the memory dimension of costing with at least as much attention as the CPU dimension — especially given the resource-constrained setting that motivates the whole CPS. Should the Goals section include something explicit about memory behavior? |
Sorry, I'm no longer a part of the Plutus team.
Is it any different from an application of a lambda though? |
|
@Unisay - those are excellent points, thank you! Definitely something I had overlooked. I will amend the CPS text to discuss this in more detail, and mention the issues you've raised. Edit: Having thought about it a little and done some experiments, there's two comments I want to add in response. The claim that "under strict evaluation, intermediates get consumed and freed right away" may be theoretically true, but this is definitely not how the SECP machine that evaluates UPLC works. To illustrate, I've put together a simple set of benchmarks comparing a 'fused' implementation of module Main (main) where
import PlutusCore (Contains, DefaultUni)
import Data.Kind (Type)
import Plutarch.Prelude (Term, S, PBuiltinList (PNil, PCons),
PInteger, pconstant, (#), (:-->),
phoistAcyclic, plam, (#*), (#+), pfix, pmatch, PlutusRepr,
pcon, (#$), ppairDataBuiltin, pdata, PIsData, pfromData,
PBuiltinPair (PBuiltinPair))
import Test.Tasty (testGroup)
import Plutarch.Test.Bench (
bench,
defaultMain,
)
import Plutarch.Test.Utils (precompileTerm)
main :: IO ()
main = defaultMain $ testGroup "Zip" [
testGroup "pzip3Direct" [
bench "length 5" (precompileTerm (pzip3Direct f) # plistA5 # plistB5 # plistC5),
bench "length 10" (precompileTerm (pzip3Direct f) # plistA10 # plistB10 # plistC10),
bench "length 15" (precompileTerm (pzip3Direct f) # plistA15 # plistB15 # plistC15)
],
testGroup "pzip3Indirect" [
bench "length 5" (precompileTerm (pzip3Indirect f) # plistA5 # plistB5 # plistC5),
bench "length 10" (precompileTerm (pzip3Indirect f) # plistA10 # plistB10 # plistC10),
bench "length 15" (precompileTerm (pzip3Indirect f) # plistA15 # plistB15 # plistC15)
]
]
-- Test data
plistA5 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistA5 = pconstant (iota 1 5)
plistA10 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistA10 = pconstant (iota 1 10)
plistA15 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistA15 = pconstant (iota 1 15)
plistB5 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistB5 = pconstant (iota 6 10)
plistB10 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistB10 = pconstant (iota 6 15)
plistB15 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistB15 = pconstant (iota 6 20)
plistC5 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistC5 = pconstant (iota 21 25)
plistC10 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistC10 = pconstant (iota 21 30)
plistC15 :: forall (s :: S) . Term s (PBuiltinList PInteger)
plistC15 = pconstant (iota 21 35)
f :: forall (s :: S) . Term s (PInteger :--> PInteger :--> PInteger :--> PInteger)
f = phoistAcyclic $ plam $ \x y z -> x #* (y #+ z)
pzip3Direct :: forall (a :: S -> Type) (b :: S -> Type) (c :: S -> Type) (d :: S -> Type) (s :: S) .
(DefaultUni `Contains` PlutusRepr a,
DefaultUni `Contains` PlutusRepr b,
DefaultUni `Contains` PlutusRepr c,
DefaultUni `Contains` PlutusRepr d) =>
Term s (a :--> b :--> c :--> d) ->
Term s (PBuiltinList a :--> PBuiltinList b :--> PBuiltinList c :--> PBuiltinList d)
pzip3Direct g = pfix $ \self -> plam $ \xs ys zs -> pmatch xs $ \case
PNil -> pcon PNil
PCons x xs' -> pmatch ys $ \case
PNil -> pcon PNil
PCons y ys' -> pmatch zs $ \case
PNil -> pcon PNil
PCons z zs' -> pcon . PCons (g # x # y # z) $ self # xs' # ys' # zs'
pzip3Indirect :: forall (a :: S -> Type) (b :: S -> Type) (c :: S -> Type) (d :: S -> Type) (s :: S) .
(DefaultUni `Contains` PlutusRepr a,
DefaultUni `Contains` PlutusRepr b,
DefaultUni `Contains` PlutusRepr c,
DefaultUni `Contains` PlutusRepr d,
PIsData b,
PIsData c) =>
Term s (a :--> b :--> c :--> d) ->
Term s (PBuiltinList a :--> PBuiltinList b :--> PBuiltinList c :--> PBuiltinList d)
pzip3Indirect g = plam $ \xs ys zs ->
pzip2 (plam $ \x yz -> pmatch yz $ \case
PBuiltinPair y z -> g # x # pfromData y # pfromData z) # xs #$ pzip2 (plam $ \y z -> ppairDataBuiltin # pdata y # pdata z) # ys # zs
-- Helpers
iota :: Integer -> Integer -> [Integer]
iota start stop = [start, start + 1 .. stop]
pzip2 :: forall (a :: S -> Type) (b :: S -> Type) (c :: S -> Type) (s :: S) .
(DefaultUni `Contains` PlutusRepr a,
DefaultUni `Contains` PlutusRepr b,
DefaultUni `Contains` PlutusRepr c) =>
Term s (a :--> b :--> c) ->
Term s (PBuiltinList a :--> PBuiltinList b :--> PBuiltinList c)
pzip2 g = pfix $ \self -> plam $ \xs ys -> pmatch xs $ \case
PNil -> pcon PNil
PCons x xs' -> pmatch ys $ \case
PNil -> pcon PNil
PCons y ys' -> pcon . PCons (g # x # y) $ self # xs' # ys'These are the results when run: As we can see, With regard to the adversarial use of thunks you describe, I agree with @effectfully - this is no different to having lambdas with big blowouts in memory use. Am I missing something? |
The on-chain CEK machine is capable of freeing actual memory (not to the extent of the CESK machine), but yes, it's completely irrelevant, because MEM doesn't measure peak memory consumption.
You can ask Claude directly and avoid going through the low-bandwidth interface (@Unisay). |
This raises (at least) a few questions:
I believe all answers deserve to be documented. |
|
In theory, MEM measures something like "total allocations that aren't 100% certain to be transient". In practice, MEM is fucked up and
None of that is relevant here though. The question is "is this any worse than applied lambdas?". I think the answer is "no", but I didn't really look into it. |
|
Plutus Report (publicly available) gives a hint about original idea |
|
Overall, the proposal is sensible. @zliu41 has performed preliminary experiments where applying Terminology: The proposal uses "true laziness" to refer to the case where applying |
|
@wadler - agreed, better to use the standard terminology. Will change. |
It literally doesn't matter btw. |
|
The summary of my evaluation is attached. TLDR: lazy delay/force is about 2x the cost of by-name delay/force. For well-optimized programs that don't have excessive unnecessary delay/force, the overhead is less than 3%. For poorly-optimized programs, the overhead can be about 10%, or even 90%+ in extreme cases. Incidentally I find the motivation a bit weak, as I couldn't think of a good example where laziness is a clear win. I'll look into and try to understand the Boehm-Berrarducci use case. |
Please correct me I'm wrong, but I've re-read the whole thread above & it doesn't sound like AI review is being recommended here for software tasks, but for CIP review itself. As an author I've never entered my own writing into an LLM and am not really happy with crawler LLMs going over it either. As a CIP editor it tells me nothing of use because, more often than not, what might be considered errors or inconsistencies in evolving standards documents vs. the hoard of LLM knowledge are exactly what makes these outlying statements useful and "true" in the long run. The current team of editors doesn't welcome AI in CIP review but doesn't discriminate against it either as long as the AI is used to support a review rather than to simply produce it:
So it I think that putting AI CIP review into this repository's CI would be unnecessary, wasteful, irrelevant, and perhaps even expensive (relatively: even at pennies per LLM token use, this would be head & shoulders over other free tier services). However I do think the open source nature of CIP material would encourage AI enthusiasts to build CIP summaries — with perhaps a body of review, and maybe developer highlights — on some aggregated web site using GitHub merged CIPs, PRs, and maybe even issues as a source. If you think any of this means our current CIP process is incomplete, you can open up a repo issue and/or contribute to this one with a related consideration & in any case let's continue the discussion (if still interested) elsewhere (cc @Ryun1 @perturbing): |
|
@kozross from my distantly underqualified level it seems like the welcome arrival of subject matter experts into this thread has provided editors (cc @Ryun1 @perturbing) with confirmation that this is a viable CPS with a practical scope for CIPs to be produced within this field. Therefore I'm adding it to the next CIP meeting agenda where I feel sure we'll now be able to confirm this as a candidate & assign a CPS number: https://hackmd.io/@cip-editors/131 — which would also provide 2 weeks to do some updates as indicated above (e.g. #1150 (comment)) & from further review. |
|
As per @wadler 's comment, I have revised all mentions of 'true' laziness to use the term 'call by need'. |
rphair
left a comment
There was a problem hiding this comment.
@kozross it's good to have updated the title with a better description of the proposed functionality... but we're still just considering Approaches in the CPS I think (?) and so the "feature" title would be more appropriate to an implementation CIP...
|
I haven't fully understood the BB example, but regarding this argument:
Can you not avoid reevaluation by forcing the delayed expression at the right time? For example, suppose there is a large and expensive expression that is used in two of the three branches, and in one of the branches it is used Does this not apply to the BB encoding example? |
rphair
left a comment
There was a problem hiding this comment.
@kozross this was discussed at the CIP meeting today where it was confirmed, as I suggested before, that the practicality of this problem statement has been established by GitHub review since it was originally introduced at an earlier meeting: so we are happy to confirm this as a candidate & editors will follow the technical discussions here until subject matter experts seem to agree that this is ready for a final review.
Please rename the containing directory to CPS-0028 and update the "Rendered" link in the original post 🎉
|
I'd like to see the motivation strengthened a bit more. I'll give it some thoughts in the next few days, and possibly dig up some old discussions. |
|
@zliu41 - it isn't clear how to do this in general. Let me give an example in context to illustrate why this is so difficult. As part of current work on the Grumplestiltskin project, we have been considering elliptic curves over second-degree finite field extensions. These are even worse than the example given, as second-degree field extensions are pairs of finite field elements, which makes their operations (particularly multiplication and division) far more involved. In particular, they require yet more 'auxiliary values' (specifically a field irreducible), and each operation in particular corresponds to quite a large number of builtins. As part of our work, we did a pedantic comparison of both direct and indirect representations, even given the issues described already. We have found that indirect representations are significantly better, but still have huge costs, caused by the precise problem I described before. As to why this is hard to avoid, consider the case of elliptic curve point scaling. If we observe the implementation, we can see that the Now, I might be wrong in this particular case, and it could be possible to avoid this. However, I certainly can't think of how. Furthermore, this is a case that's both extremely natural in this situation, and likely to be extremely bad. If you see the scaling benchmarks, you can observe that the exunit and memory cost flies off the handle pretty quickly. Furthermore, the exponent in this context is tiny (64), whereas in practice, the required scalar will be a high-entropy random number which is far, far bigger. And this is just for verification - constructing the necessary bilinear pairing function would be many, many times worse than this. |
rphair
left a comment
There was a problem hiding this comment.
@kozross re: #1150 (comment) & earlier #1150 (review) the containing directory needs to be further renamed from CPS-28 to CPS-0028.
|
@rphair - done. |
|
Regarding motivation:
This is true for general programming but it's unclear whether any real validators, whose role is to validate transactions, require memoization or tabulation. Also, many such algorithms require a data structure that can store thunks and supports efficient indexing, which UPLC doesn't have (builtin Array doesn't fit because it cannot store thunks).
I like this motivation personally but it's quite specific to Plinth.
This is a useful example, but it's a bit too complex, with many details that may not be relevant. It would be better to extract a general argument, and use this example as concrete evidence. With that said, I do think there's a decent motivation: I don't think laziness is strictly necessary to achieve low cost/size, but there's a good argument that it makes things more convenient. How about the following: 1. Convenience. To reuse the example I mentioned above: suppose there is a large and expensive expression that is used in two of the three branches, and in one of the branches it is used twice. In this case, you can indeed avoid duplicating code or work without laziness: but this is cumbersome and not ergonomic, not to mention that some surface languages may not even have delay/force constructs, making it impossible to write this code (I think this is the case with Aiken, but I'm not certain). Users would much rather write something like this (not just in Plinth but in any surface language): But if there's no laziness in UPLC, the burden would be on the compiler to optimize the latter into the former. I don't think any current compiler does this, and indeed, doing so is non-trivial (it would require a number of analyses, some of which may not be decidable) and may not preserve semantics. With lazy delay/force this becomes much easier. I think the BB encoding example is a specific, concrete example of this argument. 2. Code reuse. Here's a blog post by Augustsson that @effectfully pointed to me before, which argues that "Strict evaluation is fundamentally flawed for function reuse". Although validators are relatively small programs, code reuse is still very much relevant: the validation logic can frequently be expressed as a composition of list filtering, mapping, small predicates etc. Manually inlining or fusing things makes validators harder to audit and more prone to bugs. A number of general-purpose strict languages also support limited form of laziness (e.g., Scala and OCaml), likely due to reasons similar to 1 and 2. In particular, here's an example used by "Functional Programming in Scala" to motivate lazy lists: List(1, 2, 3, 4).map(_ + 10).filter(_ % 2 == 0).map(_ * 3)3. Some related discussions/comments in the Cardano community.
|
This describes possible ways forward to achieve 'true' laziness as part of UPLC. This is meant to be an additional mechanism to
DelayandForce, as these constructs have no notion of whether something has already been evaluated or not. What we are discussing specifically adds the capability of 'remembering' what has already been evaluated, similarly to how laziness works in GHC Haskell, for example.Rendered