From aaebc28b0e35681b6bd4d23d7164c90a7da751ac Mon Sep 17 00:00:00 2001
From: Ilia Rodionov <hey@euony.me>
Date: Thu, 4 Sep 2025 12:52:34 -0600
Subject: [PATCH 1/4] CIP-? | Efficient scalars for BLS12-381

---
 CIP-?/README.md | 433 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 433 insertions(+)
 create mode 100644 CIP-?/README.md

diff --git a/CIP-?/README.md b/CIP-?/README.md
new file mode 100644
index 0000000000..78745e6c14
--- /dev/null
+++ b/CIP-?/README.md
@@ -0,0 +1,433 @@
+---
+CIP: ?
+Title: Efficient scalars for BLS12-381
+Status: Proposed
+Category: Plutus
+Authors:
+    - Ilia Rodionov <hey@euony.me>
+Implementors: []
+Discussions:
+    - https://github.com/cardano-foundation/CIPs/pull/?
+Created: 2025-09-11
+License: CC-BY-4.0
+---
+
+## Abstract
+
+The Chang upgrade in Cardano brought support for the BLS12-381 elliptic curve introduced
+in [CIP-381](https://cips.cardano.org/cip/CIP-0381).
+In accordance with the priority of keeping the number of primitives in Plutus Core as low as possible, 
+only new types and operations for elements of $G1$ and $G2$ groups were introduced, 
+as well as slightly restricted type `bls12_381_MlResult` for the target group $G_T$. 
+Neither the elements of the _base field_ $F_p$ nor _scalar field_ $F_r$ 
+got dedicated treatment on the grounds of the fact that all modular operations 
+can be naively performed with the existing `Integer` and its regular arithmetic.
+While this approach seems sensible (and arguably even should be preserved), it has several drawbacks.
+Firstly, Plutus loses the ability to use more efficient modular arithmetic
+over finite fields, which is exactly the reason why they are so popular in
+cryptography.
+Secondly, this complicates the implementation and cost model for some functions 
+that expect an element of $T_r$ represented as an `Integer` 
+(please refer to the "Cost model" section in CIP-133 for details).
+
+An important step towards the optimization of common operations over BLS12-381 is done by 
+[CIP-133](https://cips.cardano.org/cip/CIP-0133), which proposes 
+the effective way to do _multi-scalar multiplication_ (MSM).
+Scalar multiplication was the main bottleneck for many protocols 
+since operations in $G_1$ and $G_2$ are relatively expensive.
+But now, when we have more exunits freed by quick MSM, 
+the relative share of resources spent on operations over the scalars 
+might become even bigger.
+
+In practice, some cryptographic protocols need to perform quite extensive arithmetic
+over the _scalar field_ $F_r$, particularly when working with _polynomials in KZG commitments_.
+This CIP presents a motivational example, discusses related benchmarks,
+and considers different ways of evolving Plutus 
+toward efficient implementation of scalars for BLS12-381 curve.
+
+## Motivation: why is this CIP necessary?
+
+### Example use case: multiplying binomials
+
+As an example, let's consider _pairing-based cryptographic accumulators_ 
+[[SKBP22]](https://dl.acm.org/doi/pdf/10.1145/3548606.3560676).
+To verify a usually off-chain-calculated membership proof, 
+the validator needs to calculate
+$N+1$ coefficients $c_i$ for a _final polynomial_ $P(x)$ 
+by multiplying given $N$ _normalized binomials_ $B_i$:
+
+$$
+\begin{align}
+B_i(x) = x + a_i \qquad (i=1,\dots,n) \\
+P(x) = \prod_{i=1}^N B_i(x) = \sum_{i=0}^{N} c_i \cdot x^i
+\end{align}
+$$
+
+The straightforward way to solve this problem is the _schoolbook convolution_, 
+which comes with $O(n^2)$ complexity.
+This is what both
+[Haskell](https://github.com/perturbing/plutus-accumulator/blob/main/plutus-accumulator/src/Plutus/Crypto/BlsUtils.hs#L499-L505)
+and [Aiken](https://github.com/perturbing/plutus-accumulator/blob/main/aiken-bilinear-accumulator/lib/aiken_bilinear_accumulator/poly.ak#L3-L13) 
+implementations of [plutus-accumulator](https://github.com/perturbing/plutus-accumulator) use.
+It can be slightly improved by _the divide-and-conquer_ method by pairing binomials, multiplying them, and
+recursively multiplying the results, though it's still quadratic just with better coefficients.
+
+Although it may not be the best candidate for an on-chain setting,
+it's worth mentioning _[Number Theoretic Transform (NTT)](https://wiki.algo.is/Number%20theoretic%20transform)_, 
+which achieves sub-quadratic time complexity of $O(n \cdot \log n)$.
+BLS12-381 was specifically designed in so that 
+the multiplicative group ${F}_r^\times$ of size $(r-1)$
+has $2^{32}$-nd roots of unity enabling efficient FFT/NTT operations on large polynomials.
+
+The important thing is that both algorithms rely on _modular multiplication_ 
+over the _scalar field_.
+
+As soon as the coefficients for the final polynomial have been calculated 
+they are (typically) used for scalar multiplication,
+either by pointwise `bls12_381_g1_scalar_mul` and `bls12_381_g2_scalar_mul` built-ins
+or by recently added multi-scalar multiplication that computes the whole batch in one go
+using an optimized algorithm. 
+Due to the use of `Integer` for representing scalars,
+and since the results of those functions are the same
+for the whole _congruence class_ of scalars,
+they try to reduce scalar arguments implicitly 
+to fit them into the field by calling [mod operation](https://github.com/IntersectMBO/cardano-base/blob/6f9c20abdd3010e5a25356580cc968ba430101ad/cardano-crypto-class/src/Cardano/Crypto/EllipticCurve/BLS12_381/Internal.hs#L521).
+As mentioned in the beginning, this complicates the cost model
+but also pushes developers towards the _use of unbounded integers_.
+(An alternative and arguably clearer decision might be to throw if an argument falls out of the field.)
+So nowadays, developers have three options to accomplish this task:
+
+1. Skip the step of taking the modulus with `modInteger`, i.e., to use
+   unbounded integers for the whole calculation of the coefficients, and
+   exploit the fact that functions for scalar multiplication reduce scalars 
+   under the hood.
+   As we will see in a moment, this is the best way in terms of exunits.
+
+2. Take care of staying within the field when calculating coefficients and
+   pass integer values that represent proper scalars to the consuming functions.
+   Despite the intuition and common sense, this turns out to be more
+   expensive due to the cost of the modulo operation.
+   
+3. A variation of (2) is to use Aiken stdlib [Scalar](https://aiken-lang.github.io/stdlib/aiken/crypto/bls12_381/scalar.html#Scalar) 
+   module or similar thing in your language of choice 
+   that hides modular operation in a newtype, 
+   which obviously increases the expenditure of exunits even more.
+
+### Benchmarks
+
+The Aiken benchmarks for all three options can be found in the PR
+[here](https://github.com/perturbing/plutus-accumulator/pull/2).
+To keep track of all costs and their relative amounts in the use case being analyzed,
+let's break the whole thing down into three parts:
+
+* (a) Calculating coefficients of the final polynomial
+* (b) Producing commitment using scalar multiplication and group operation in $G_1$
+* (c) Running the Miller loop and verifying the results
+
+Part (a) performs the convolution to get the final coefficients using 
+the three different methods 1,2, and 3 described above over random samples of 1–30 binomials.
+We can observe the quadratic growth in all cases as expected.
+
+Not surprisingly, the first benchmark gives the best result of `1,67B cpu / 5,19 M mem`:
+only addition and multiplication are used, and although costs for both operations depend on the size of arguments,
+even over integers up to _10k-bit_ length, the total is relatively cheap:
+
+```aiignore
+ (a, 1) final_poly_int
+
+   memory units                                           cpu units
+   ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 5189564.0    ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 1669321088.0
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⣀⡠⠤⠔⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⣀⣀⡠⠤⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⠥⠤⠤⠖⠲⠊⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 29989.0      ⠥⠤⠤⠖⠲⠊⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 8513737.0
+   1.0                                 30.0               1.0                                 30.0
+```
+
+If we try to stay in the field using the general `modInteger` after all operations with scalars,
+we pay more, since the modulo operation contributes more than we save up on the size of arguments
+for addition and multiplication, so we get extra `+0,18B cpu / +0,37M mem` exunits spent:
+
+```aiignore
+ (a, 2) final_poly_int_mod
+ 
+   memory units                                           cpu units
+   ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 5554829.0    ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 1847736704.0
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠊⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⣀⡠⠤⠔⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⣀⣀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⠥⠤⠤⠖⠲⠊⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 31605.0      ⠥⠤⠤⠖⠲⠊⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 9250599.0
+   1.0                                 30.0               1.0                                 30.0
+```
+
+And if we do the same using `Scalar` module from Aiken, we apparently will incur the same overhead
+plus additional exunits for the newtype, which turns out to be roughly in the same ballpark.
+This might not be very significant, but could make `Scalar` a tool that some may prefer 
+to avoid due to inefficiency:
+
+```aiignore
+ (a, 3) final_poly_scalar
+
+   memory units                                           cpu units
+   ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 5997541.0    ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⢀⠕⡁ 1956092032.0
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠒⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠔⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⠔⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⣀⡠⠔⠒⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⣀⡠⠔⠒⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⠥⠤⠤⠖⠩⠉⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 45523.0      ⠥⠤⠤⠖⠲⠊⠍⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 12851804.0
+   1.0                                 30.0               1.0                                 30.0
+```
+
+Moving to part (b), which performs the calculation of the commitment in $G_1$ using scalar multiplication
+and the group operation, we may expect to see some proportional payoff when passing unbounded out-of-field arguments.
+Here we benchmark only the calculation of the commitment,
+the sampling of final unbound/bound coefficients is done in separate `Fuzzer`s, 
+i.e., outside the benchmark itself.
+The results show that for some reason, this expectation is generally correct, 
+but the difference is really very tiny. 
+For the biggest input of 30 elements we pay `0,1B cpu`, which is roughly 
+one-fourth of what we saved up earlier. 
+The memory usage is not affected.
+
+```aiignore
+ (b, 1) g1_commitment_unbound
+   memory units                                           cpu units
+   ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⣈⡠⠕⡁ 254920.0     ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⣈⡠⠕⡁ 5908456448.0
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠔⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠔⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⢀⡠⠒⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⢀⡠⠒⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⠥⠪⠉⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 22166.0      ⠥⠪⠉⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 430848992.0
+   1.0                                 30.0               1.0                                 30.0
+
+ (b, 2/3) g1_commitment_field
+   memory units                                           cpu units```
+   ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⣈⡠⠕⡁ 254920.0     ⡁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⠈⠀⠁⣈⡠⠕⡁ 5892875264.0
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠊⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠒⠁⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠒⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠔⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠤⠔⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠒⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠤⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⡁⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁              ⡁⠀⠀⠀⠀⠀⠀⠀⠀⡠⠔⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡁
+   ⠄⠀⠀⠀⠀⢀⡠⠒⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄              ⠄⠀⠀⠀⠀⢀⡠⠒⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄
+   ⠂⠀⠀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂              ⠂⠀⠀⡠⠒⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂
+   ⠥⠪⠉⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 22166.0      ⠥⠪⠉⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠠⠀⠄⠁ 430848992.0
+   1.0                                 30.0               1.0                                 30.0
+```
+
+The last part (c) that performs the actual pairing doesn't depend on the input size 
+and has a constant cost of `1B / 5K mem`. 
+
+Overall, we have the following distribution of cpu exunits for the biggest input in the benchmark:
+
+|                 Part | Unbounded (1) | Percent | Reduced (2)   | Percent | Diffence     |  
+|---------------------:|---------------|---------|---------------|---------|--------------|  
+| **(a)** Coefficients | 1_669_321_088 | 19%     | 1_847_736_704 | 21%     | +178_415_616 |  
+|   **(b)** Commitment | 5_908_456_448 | 68%     | 5_892_875_264 | 67%     | −15_581_184  |  
+|      **(c)** Pairing | 1_098_158_336 | 13%     | 1_098_158_336 | 12%     | -            |  
+|           **Total:** | 8_675_935_872 | -       | 8_838_770_304 | −       | +162_834_432 |
+
+We haven't benchmarked _multi-scalar multiplication_ yet, but even using pointwise multiplication,
+we can see that the percent spent on part (a) calculating coefficients is quite substantial,
+and it will become bigger when part (b) becomes cheaper.
+
+### Montgomery multiplication
+
+Though `Integer`s can carry out required computations, for prime fields
+there exists much more efficient methods for multiplication and addition.
+[Montgomery multiplication](https://en.wikipedia.org/wiki/Montgomery_modular_multiplication)
+seems to be a viable alternative for such and similar use cases that use 
+multiplication over many scalars in one go since it's known to be much faster.
+
+We did [preliminary benchmarks](https://github.com/euonymos/bench-montgomery) for multiplication of binomials 
+to compare the optimized implementation in `blst` library (which is already used in Plutus for BLS12-381)
+with naive implementation.
+We used Rust bindings for `blst` and Rust-native `num-bigint` library. 
+The underlying bindings are the same as those used in `cardano-base` for `bslt`, 
+so we can expect similar behavior for the Haskell stack.
+Each benchmark was executed 1000 times on Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz.
+Values show average time and standard deviation.
+
+| Size | Montgomery Avg | Montgomery σ | Naive Avg    | Naive σ      | Speedup |
+|-----:|----------------|--------------|--------------|--------------|---------|
+|   10 | 101.734 µs     | 22.565 µs    | 685.302 µs   | 159.436 µs   | 6.74x   |
+|   15 | 138.966 µs     | 26.496 µs    | 953.121 µs   | 251.909 µs   | 6.86x   |
+|   20 | 186.083 µs     | 21.703 µs    | 1.363 ms     | 189.064 µs   | 7.33x   |
+|   25 | 241.934 µs     | 30.409 µs    | 2.096 ms     | 370.952 µs   | 8.67x   |
+|   30 | 298.265 µs     | 33.504 µs    | 2.926 ms     | 401.267 µs   | 9.81x   |
+|   31 | 310.631 µs     | 33.857 µs    | 3.108 ms     | 334.523 µs   | 10.00x  |
+|   32 | 328.822 µs     | 45.218 µs    | 3.461 ms     | 521.307 µs   | 10.52x  |
+|   35 | 355.684 µs     | 49.508 µs    | 4.142 ms     | 579.682 µs   | 11.65x  |
+|   40 | 396.741 µs     | 63.206 µs    | 5.474 ms     | 749.828 µs   | 13.80x  |
+|   45 | 472.785 µs     | 85.491 µs    | 7.602 ms     | 1.501 ms     | 16.08x  |
+|   50 | 534.158 µs     | 119.238 µs   | 10.226 ms    | 3.898 ms     | 19.14x  |
+|  100 | 1.090 ms       | 239.000 µs   | 35.546 ms    | 8.156 ms     | 32.61x  |
+|  200 | 3.042173ms     | 483.892µs    | 141.871796ms | 21.302688ms  | 46.64x  |
+|  300 | 6.040093ms     | 1.034338ms   | 319.940927ms | 50.53683ms   | 52.97x  |
+|  400 | 8.951528ms     | 901.017µs    | 500.398941ms | 49.487447ms  | 55.90x  |
+| 1000 | 49.006538ms    | 4.043325ms   | 3.044053658s | 223.957115ms | 62.12x  |
+
+The table shows that the performance improvement rises quickly with the number of binomials.
+The provided figures for Montgomery multiplication _include_ time for converting the initial
+vector of coefficients into the Montgomery form and back to integers in the end for the results.
+
+### Summary
+
+The way Plutus treats BLS12-381 scalars (and prime fields in general, for that matter) 
+is understandable but not quite satisfying for several reasons.
+
+The main argument showcased by this use case is that the significant share of work 
+(20% using pointwise multiplication, which will be bigger with MSM)
+can be done much faster and with less resource usage in terms of real performance.
+Even for on-chain-plausible sizes of inputs, Montgomery multiplication can give
+at least 30–40 times speed-up.
+
+Additionally, there are other reasons to consider:
+
+* It's counterintuitive since it clashes with the mental model of the finite field.
+* Representation of field elements as `Integers` (and even worse, peculiarities of the cost model as we saw)
+pushes developers to use unbounded integers, which may lead to misunderstanding and unexpected bugs.
+* It complicates the cost model by (mis)using `Integer` type for scalars.
+* `Scalar` module from Aiken buys some safety at the cost of additional expenses and likely won't be used.
+* The choice of `Integer` also may require additional conversions, 
+since the most probable source of scalars that scripts calculate is hashing functions
+that return byte strings.
+
+Incorporating the effective multiplication over the scalar field
+directly will streamline such operations, reduce transaction costs,
+thereby advancing the Plutus ecosystem in terms of functionality and dev experience.
+
+### Impact
+
+In Cardano, the multiplication of scalars in the BLS12-381 scalar field is used 
+by cryptographic primitives that need polynomial arithmetic over — mostly
+when dealing with polynomial commitments (KZG).
+
+Those primitives are used in many Cardano products, just to mention a few:
+
+- **Hydrozoa** (a brand-new layer-2 solution for Cardano) - 
+uses pairing-based cryptographic accumulators to commit to a set of L2 utxos 
+that can be withdrawn once a dispute is resolved in the rule-based regime of operation.
+More specifically, the withdrawal transaction calculates the hashes of the outputs
+and validates their membership in the accumulator using the proof provided.
+MSM and efficient scalars can boost the number of utxos withdrawn in one go significantly.
+- 
+- TBD: add more examples
+
+## Specification
+
+> This is all the _very preliminary_ description. I am willing to work on it if the community
+gives me the green light in general.
+
+The various BLS12-381-specific operations for the scalar field $F_r$, including Montgomery multiplication 
+are implemented in [blst](https://github.com/supranational/blst/blob/e99f7db0db413e2efefcfd077a4e335766f39c27/bindings/blst.h#L88-L105) library, 
+which is already a dependency of [cardano-base](https://github.com/IntersectMBO/cardano-base/blob/master/cardano-crypto-class/src/Cardano/Crypto/EllipticCurve/). 
+It has been used for implementing 
+[CIP-381](https://cips.cardano.org/cip/CIP-0381) 
+and [CIP-133](https://cips.cardano.org/cip/CIP-0133). 
+Basically, we would like to expose several additional functions from this library in the Plutus API.
+
+### New types definition
+
+To represent a scalar in $F_r$ stored in the Montgomery form, a new opaque type `bls12_381_fr`
+(which corresponds to `blst_fr` type) can be used along with introducing and eliminating from/to a _byte string_: 
+
+```
+bytestring_to_bls12_381_fr: [bool, 𝚋𝚢𝚝𝚎𝚜𝚝𝚛𝚒𝚗𝚐] -> bls12_381_fr
+bls12_381_fr_to_bytestring :: [bool, bls12_381_fr] -> 𝚋𝚢𝚝𝚎𝚜𝚝𝚛𝚒𝚗𝚐
+```
+
+The conversion is little-endian if the first argument is `false` and big-endian if it is `true`.
+We prefer not to choose the name `bls12_381_scalar` to avoid name clashes with existing functions
+for _scalar multiplication_ like `bls12_381_G1_scalarMul`.
+
+### Function definition
+
+In addition to the conversion functions mentioned in the previous section,
+we propose to define the only function for Montgomery modular multiplication
+**bls12_381_fr_mul** as follows:
+
+```
+bls12_381_fr_mul :: [bls12_381_fr, bls12_381_fr] -> bls12_381_fr
+```
+TBD: Scalar and multi-scalar multiplication in $G1$ and $G2$ are typical downstream functions 
+for `bls12_381_fr` values, but in the current Plutus, they use integers, i.e., double conversion is needed:
+`bls12_381_fr -> bytestring -> integer` to call them.
+The underlying `blst` functions take a pointer to raw bytes, i.e., `const byte *scalar`, so probably
+we should consider ways of simplifying this by either having scalar multiplication that works with
+byte arrays or providing a function to go directly from `bls12_381_fr` to `integer`.
+
+TBD: Additionally, we might consider adding some other functions that `blst` [provides](https://github.com/supranational/blst/blob/e99f7db0db413e2efefcfd077a4e335766f39c27/bindings/blst.h#L88-L105).
+
+### Cost model
+
+The computational impact of Montgomery multiplication is straightforward, since the values of
+type `bls12_381_fr` are statically limited to 255 bits, so for the newly added `bls12_381_fr_mul`
+function we can use a static cost model.
+
+TDB: Introduction of this type potentially allows simplifying cost models for some other functions.
+Currently, scalars have to be reduced modulo the order of the group before being passed to the `blst` 
+functions, see [cardano-base](https://github.com/IntersectMBO/cardano-base/blob/6f9c20abdd3010e5a25356580cc968ba430101ad/cardano-crypto-class/src/Cardano/Crypto/EllipticCurve/BLS12_381/Internal.hs#L521).
+
+## Rationale: how does this CIP achieve its goals?
+
+The availability of built-in functions in the Plutus language will provide a
+more efficient way to perform this important type of computation,
+bump the limits of operations that fit into a single transaction,
+and reduce costs.
+
+Implementing the Montgomery multiplication directly in Plutus should be technically possible
+with CIP-122, but will hardly bring any improvements mentioned and so is not advisable.
+
+## Path to Active
+
+### Acceptance Criteria
+
+We consider the following criteria to be essential for acceptance:
+
+- [ ] The PR for this functionality is merged in the Plutus repository.
+- [ ] This PR must include tests, demonstrating that it behaves as the specification requires in this CIP.
+- [ ] A benchmarked use case is implemented in the Plutus repository, demonstrating that realistic use of this primitive does, in fact, provide major cost savings.
+
+### Implementation Plan
+
+- [ ] IOG Plutus team consulted and accept the proposal.
+- [ ] Authors to provide preliminary benchmarks of naive vs. Montgomery multiplication for use cases in general and in Plutus:
+  - https://github.com/euonymos/bench-montgomery
+
+## Copyright
+
+This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode).

From 26c87ed042acf26c339fdf3aa12304da62e608c6 Mon Sep 17 00:00:00 2001
From: Robert Phair <rphair@cosd.com>
Date: Tue, 16 Sep 2025 16:53:52 +0545
Subject: [PATCH 2/4] adding PR number

---
 CIP-?/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CIP-?/README.md b/CIP-?/README.md
index 78745e6c14..dbcd4adc08 100644
--- a/CIP-?/README.md
+++ b/CIP-?/README.md
@@ -7,7 +7,7 @@ Authors:
     - Ilia Rodionov <hey@euony.me>
 Implementors: []
 Discussions:
-    - https://github.com/cardano-foundation/CIPs/pull/?
+    - https://github.com/cardano-foundation/CIPs/pull/1087
 Created: 2025-09-11
 License: CC-BY-4.0
 ---

From aeede1a65062ec8823cee1a6f3df263f662f381d Mon Sep 17 00:00:00 2001
From: Robert Phair <rphair@cosd.com>
Date: Wed, 15 Oct 2025 01:26:22 +0545
Subject: [PATCH 3/4] assign CIP number 166

---
 CIP-?/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CIP-?/README.md b/CIP-?/README.md
index dbcd4adc08..6227c985d0 100644
--- a/CIP-?/README.md
+++ b/CIP-?/README.md
@@ -1,5 +1,5 @@
 ---
-CIP: ?
+CIP: 166
 Title: Efficient scalars for BLS12-381
 Status: Proposed
 Category: Plutus

From 407117b7377a99d6128657833fdc1f0d93e44a8a Mon Sep 17 00:00:00 2001
From: Ilia Rodionov <hey@euony.me>
Date: Wed, 5 Nov 2025 18:02:09 -0600
Subject: [PATCH 4/4] CIP-0166 | move to CIP-0166 folder

---
 {CIP-? => CIP-0166}/README.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {CIP-? => CIP-0166}/README.md (100%)

diff --git a/CIP-?/README.md b/CIP-0166/README.md
similarity index 100%
rename from CIP-?/README.md
rename to CIP-0166/README.md