-
Notifications
You must be signed in to change notification settings - Fork 17
Running | Data Representation
This tutorial will pick up where the introduction left off. It will discuss the data representation aspects of the miniboxing transformation, as explained in the Unifying Data Representation Transformations paper.
In the introduction, we created a class C:
class C[@miniboxed T](val t: T)
and noticed that its specialized variants C_L
and C_J
included a weird Tsp @storage[Long]
type in the minibox-inject
compiler phase, which was later transformed to Long
in minibox-commit
. Let us look at this process.
First of all, there are several miniboxing phases in the compiler pipeline:
$ mb-scalac -Xshow-phases
phase name id description
---------- -- -----------
... .. ...
uncurry 13 uncurry, translate function values to anonymous classes
minibox-inject 14
minibox-coerce 15
minibox-commit 16
tailcalls 17 replace tail calls by jumps
... .. ...
Here we see the main three phases introduced by the miniboxing plugin (there are another 3 which are introduced for purely technical reasons, to maintain compatibility with the rest of the compiler: pretyper
, posttyper
and hijacker
). The main tree phases map directly to the data representation mechanism phases:
-
minibox-inject
duplicates methods and classes and adds the@storage
annotation -
minibox-coerce
introduces explicit coercions between boxed and miniboxed values -
minibox-commit
gives the final semantics to annotated types and coercions
Having three transformation phases sounds like a lot of work for an otherwise trivial task: transforming a type parameter T
to Long
. To see why this is necessary, let us take an example:
object DR1 extends App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t
println(a)
}
foo(3.14)
}
Compiling this code will produce two versions of the method: foo
, the generic variant and foo_n_J
, that encodes primitive types in a long integer. The last call in the object, to foo(3.24)
will be rewritten to use foo_n_J
.
Yet, the more interesting part is how the val a: Any = t
statement is translated. If we simply replace T
by Long
, a call to foo
would not print 3.14
as expected, but the long integer encoding of the floating-point number 3.14
. This issue stems from the interaction between the miniboxed data representation and subtyping, which occurs when assigning the miniboxed value t
, encoded as a long integer, to the value a
, which is of type Any
, the top type in the Scala type system. Since Long
is a subtype of Any
a naive transformation would produce incorrect results and would not be caught by the type system.
Let us see how the miniboxing transformation handles this case (output simplified for readability):
$ mb-scalac DR1.scala -Xprint:uncurry,minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of uncurry]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
DR1.this.foo[Double](3.14)
}
}
[[syntax trees at end of minibox-inject]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
val a: Any = t;
println(a)
};
DR1.this.foo_n_J[Double](8, 3.14)
}
}
[[syntax trees at end of minibox-coerce]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
val a: Any = marker_minibox2box[T, Long](t);
println(a)
};
DR1.this.foo_n_J[Double](8, marker_box2minibox[Double, Long](3.14))
}
}
[[syntax trees at end of minibox-commit]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = {
val a: Any = MiniboxConversions.minibox2box[T](t, T_TypeTag);
println(a)
};
DR1.this.foo_n_J[Double](8, MiniboxConversions.double2minibox(3.14))
}
}
The first phase, minibox-inject
creates the a second version of the foo
method, named foo_n_J
, which will be transformed to use the miniboxing data representation. It also redirects the call foo(3.14)
to foo_n_J(DOUBLE, 3.14)
. At this point in the transformation, the signature of foo_n_J
includes T @storage[Long]
. This annotation signals that the type will be later represented as a long integer, but, at this stage, remains a generic type T
. Therefore the code val a: Any = t
is still correct, since T @storage[Long]
is compatible to T
, which in turn is a subtype of Any
. So far so good...
But the minibox-coerce
phase makes @storage[...]
annotated types incompatible with their direct counterparts, which, in turn, requires the introduction of explicit coercions. Specifically, the code val a: Any = t
is rewritten to val a: Any = marker_minibox2box[T, Long](t)
, since at this stage of the transformation, T @storage[Long]
is no longer a subtype of Any
, which is not annotated. As the name suggests, the coercion introduced at this point is a marker, not the final coercion.
The minibox-commit
phase commits to the actual alternative representation, which, in this case, is Long
. The signature of foo_n_J
becomes def foo_n_J[T](T_TypeTag: Byte, t: Long)
and the marker coercion is replaced by MiniboxConversions.minibox2box[T](t, T_TypeTag)
.
This three stage-transformation allows the miniboxing plugin to robustly, correctly and optimally transform any code, from simple examples to very complex library collection code, which uses all the language features, such as higher-kinded types, closures and implicits.
Let us now jump to another example:
object DR2 {
def foo[@miniboxed T](t: T): Unit = {
println(t.toString)
}
foo(2.71)
}
As before, simply transforming T
to Long
in the miniboxed variant of method foo
would not suffice, as the program output would be the long integer encoding 2.71
instead of the double-precision floating-point number 2.71
. This is again solved by using the data representation transformation mechanism:
$ mb-scalac DR2.scala -Xprint:uncurry,minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of uncurry]] // DR2.scala
package <empty> {
object DR2 extends Object {
def foo[@miniboxed T](t: T): Unit =
println(t.toString());
DR2.this.foo[Double](2.71)
}
}
[[syntax trees at end of minibox-inject]] // DR2.scala
package <empty> {
object DR2 extends Object {
def foo[@miniboxed T](t: T): Unit =
println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit =
println(t.toString());
DR2.this.foo_n_J[Double](8, 2.71)
}
}
[[syntax trees at end of minibox-coerce]] // DR2.scala
package <empty> {
object DR2 extends Object {
def foo[@miniboxed T](t: T): Unit =
println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit =
println(marker_minibox2box[T, Long](t).toString());
DR2.this.foo_n_J[Double](8, marker_box2minibox[Double, Long](2.71))
}
}
[[syntax trees at end of minibox-commit]] // DR2.scala
package <empty> {
object DR2 extends Object {
def foo[@miniboxed T](t: T): Unit =
println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit =
println(MiniboxDispatch.mboxed_toString(t, T_TypeTag));
DR2.this.foo_n_J[Double](8, MiniboxConversions.this.double2minibox(2.71))
}
}
In our example, the receiver of the toString
method call is a value with type T @storage[Long]
. Since changing the representation will also affect the semantics of methods called on a miniboxed value, the data representation mechanism will automatically introduce a coercion to the boxed representation, which can handle method calls. This can be seen in the foo_n_J
method translation in the minibox-coerce phase
:
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit =
println(marker_minibox2box[T, Long](t).toString());
What is even more interesting is what happens in the minibox-commit
phase:
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit =
println(MiniboxDispatch.mboxed_toString(t, T_TypeTag));
Instead of boxing the value using a miniboxToBox
and calling toString
on the boxed value, the miniboxing plugin short-circuited this by having a special method for this, mboxed_toString
, which does not require boxing at all. This is critical to obtaining good performance in the case of arrays of miniboxed values, as was discussed in the original Miniboxing paper.
One thing you probably noticed is that the annotated type is T @storage[Long]
. Why explicitly put Long
, the storage type, in the annotation? Could it be because?... Yes, miniboxing can work with multiple storage types:
$ mb-scalac C.scala -Xprint:minibox-inject -P:minibox:two-way
[[syntax trees at end of minibox-inject]] // C.scala
package <empty> {
abstract trait C[@miniboxed T] extends Object {
def t(): T;
def t_D(T_TypeTag: Byte): T @storage[Double];
def t_J(T_TypeTag: Byte): T @storage[Long]
};
class C_D[Tsp] extends Object with C[Tsp] {
private[this] val C_D|T_TypeTag: Byte = _;
private[this] val t: Tsp @storage[Double] = _;
def t(): Tsp = C_D.this.t_D(C_D.this.C_D|T_TypeTag);
def t_D(T_TypeTag: Byte): Tsp @storage[Double] = C_D.this.t;
def t_J(T_TypeTag: Byte): Tsp @storage[Long] = C_D.this.t_D(T_TypeTag)
};
class C_J[Tsp] extends Object with C[Tsp] {
private[this] val C_J|T_TypeTag: Byte = _;
private[this] val t: Tsp @storage[Long] = _;
def t(): Tsp = C_J.this.t_J(C_J.this.C_J|T_TypeTag);
def t_D(T_TypeTag: Byte): Tsp @storage[Double] = C_J.this.t_J(T_TypeTag);
def t_J(T_TypeTag: Byte): Tsp @storage[Long] = C_J.this.t
};
class C_L[Tsp] extends Object with C[Tsp] {
private[this] val t: Tsp = _;
def t(): Tsp = C_L.this.t;
def t_D(T_TypeTag: Byte): Tsp @storage[Double] = C_L.this.t();
def t_J(T_TypeTag: Byte): Tsp @storage[Long] = C_L.this.t()
}
}
The -P:minibox:two-way
argument instructs the miniboxing plugin to use three encodings: generic, encoded in long and encoded in double. We won't go into the technical details of why this is important, but having double as a variant increases performance by at least 50% when using floating-point numbers.
The most interesting part in this translation is how the miniboxed class C_D
handles converting from one representation to another in the t_J
getter. In the injection phase, T
, T @storage[Long]
and T @storage[Double]
are all compatible:
class C_D[Tsp] extends Object with C[Tsp] {
private[this] val C_D|T_TypeTag: Byte = _;
private[this] val t: Tsp @storage[Double] = _;
def t(): Tsp = C_D.this.t_D(C_D.this.C_D|T_TypeTag);
def t_D(T_TypeTag: Byte): Tsp @storage[Double] = C_D.this.t;
def t_J(T_TypeTag: Byte): Tsp @storage[Long] = C_D.this.t_D(T_TypeTag)
};
In the minibox-coerce
phase, there is a new coercion added, marker_minibox2minibox
, which can convert between different storage types:
class C_D[Tsp] extends Object with C[Tsp] {
private[this] val C_D|T_TypeTag: Byte = _;
private[this] val t: Tsp @storage[Double] = _;
def t(): Tsp = marker_minibox2box[Tsp, Double](C_D.this.t_D(C_D.this.C_D|T_TypeTag));
def t_D(T_TypeTag: Byte): Tsp @storage[Double] = C_D.this.t;
def t_J(T_TypeTag: Byte): Tsp @storage[Long] = marker_minibox2minibox[Tsp, Double, Long](C_D.this.t_D(T_TypeTag))
};
The marker_minibox2minibox
contains all the necessary type information in its type arguments: marker_minibox2minibox[Tsp, Double, Long](...)
. The type first argument is the boxed type while the next two are the initial representation and the desired representation. This gives the next phase, minibox-commit
all the information necessary for transforming the tree:
class C_D[Tsp] extends Object with C[Tsp] {
private[this] val C_D|T_TypeTag: Byte = _;
private[this] val t: Double = _;
def t(): Tsp = MiniboxConversionsDouble.minibox2box[Tsp](C_D.this.t_D(C_D.this.C_D|T_TypeTag), C_D.this.C_D|T_TypeTag);
def t_D(T_TypeTag: Byte): Double = C_D.this.t;
def t_J(T_TypeTag: Byte): Long = MiniboxConversions.unreachableConversion[Nothing]("Double", "Long")
};
Surprisingly, the minibox-commit
phase translates the marker_minibox2minibox
to MiniboxConversions.unreachableConversion
. This translation is correct as the set of primitive types that are encoded into Long
and into Double
do not overlap. This means there is no primitive type that could be encoded using both Long
and Double
at the same time, thus, there is no need for such a conversion. Still, the data representation mechanism is flexible enough to accommodate this use case too.
We have shown how the miniboxing transformation uses the data representation transformation mechanism to add one or two alternative representations for the boxed primitives. Miniboxing is a well-established Scala compiler plugin that we encourage everyone to try and play around with. The miniboxing website describes other examples and tests that we transformed using the miniboxing plugin.
The next chapter, macrobenchmarks will show how well the effort of transforming methods and classes classes pays off for the miniboxing plugin.
You can continue with the following resources: