Skip to content

Latest commit

 

History

History
881 lines (714 loc) · 41.3 KB

terminology.md

File metadata and controls

881 lines (714 loc) · 41.3 KB

Generics: Terminology

Table of contents

Generic means compile-time parameterized

Generally speaking, when we talk about generics, either checked or template, we are talking about generalizing some language construct by adding a compile-time parameter, also called a generic parameter, to it. So:

  • a generic function is a function with at least one compile-time parameter, which could be an explicit argument to the function or deduced;
  • a generic type is a function with a compile-time parameter, for example a container type parameterized by the type of the contained elements;
  • a generic interface is an interface with a compile-time parameter.

This parameter broadens the scope of the language construct on an axis defined by that parameter, for example it could define a family of functions instead of a single one.

Note that different languages allow different things to be parameterized; for example, Rust supports generic associated types.

Checked versus template parameters

When we distinguish between checked and template generics in Carbon, it is on a parameter by parameter basis. A single function can take a mix of regular, checked, and template parameters.

  • Regular parameters, or "dynamic parameters", are designated using the "<name>: <type>" syntax (or "<value>").
  • Checked parameters are designated using :! between the name and the type (so it is "<name>:! <type>").
  • Template parameters are designated using "template <name>:! <type>".

The syntax for checked and template parameters was decided in questions-for-leads issue #565.

Expected difference between checked and template parameters:

Checked Template
bounded parametric polymorphism compile-time duck typing and ad-hoc polymorphism
constrained genericity optional constraints
name lookup resolved for definitions in isolation ("early") name lookup can use information from arguments (name lookup may be "late")
sound to typecheck definitions in isolation ("early") complete type checking may require information from calls (may be "late")
supports separate type checking; may also support separate compilation, for example when implemented using dynamic witness tables separate compilation only to the extent that C++ supports it
allowed but not required to be implemented using dynamic dispatch does not support implementation by way of dynamic dispatch, just static by way of instantiation
monomorphization is an optional optimization that cannot render the program invalid monomorphization is mandatory and can fail, resulting in the program being invalid

Polymorphism

Generics provide different forms of polymorphism than object-oriented programming with inheritance. That uses subtype polymorphism where different descendants, or "subtypes", of a base class can provide different implementations of a method, subject to some compatibility restrictions on the signature.

Parametric polymorphism

Parametric polymorphism (Wikipedia) is when a function or a data type can be written generically so that it can handle values identically without depending on their type. Bounded parametric polymorphism is where the allowed types are restricted to satisfy some constraints. Within the set of allowed types, different types are treated uniformly.

Compile-time duck typing

Duck typing (Wikipedia) is when the legal types for arguments are determined implicitly by the usage of the values of those types in the body of the function. Compile-time duck typing is when the usages in the body of the function are checked at compile-time, along all code paths. Contrast this with ordinary duck typing in a dynamic language such as Python where type errors are only diagnosed at runtime when a usage is reached dynamically.

Ad-hoc polymorphism

Ad-hoc polymorphism (Wikipedia), also known as "overloading", is when a single function name has multiple implementations for handling different argument types. There is no enforcement of any consistency between the implementations. For example, the return type of each overload can be arbitrary, rather than being the result of some consistent rule being applied to the argument types.

Templates work with ad-hoc polymorphism in two ways:

  • A function with template parameters can be specialized in C++ as a form of ad-hoc polymorphism.
  • A function with template parameters can call overloaded functions since it will only resolve that call after the types are known.

In Carbon, we expect there to be a compile error if overloading of some name prevents a checked-generic function from being typechecked from its definition alone. For example, let's say we have some overloaded function called F that has two overloads:

fn F[template T:! type](x: T*) -> T;
fn F(x: Int) -> bool;

A checked generic function G can call F with a type like T* that cannot possibly call the F(Int) overload for F, and so it can consistently determine the return type of F. But G can't call F with an argument that could match either overload.

Note: It is undecided what to do in the situation where F is overloaded, but the signatures are consistent and so callers could still typecheck calls to F. This still poses problems for the dynamic strategy for compiling generics, in a similar way to impl specialization.

Constrained genericity

We will allow some way of specifying constraints as part of a function (or type or other parameterized language construct). These constraints are a limit on what callers are allowed to pass in. The distinction between constrained and unconstrained genericity is whether the body of the function is limited to just those operations that are guaranteed by the constraints.

With templates using unconstrained genericity, you may perform any operation in the body of the function, and they will be checked against the specific types used in calls. You can still have constraints, but they are optional in that they could be removed and the function would still have the same capabilities. Constraints only affect the caller, which will use them to resolve overloaded calls to the template and provide clearer error messages.

With checked generics using constrained genericity, the function body can be checked against the signature at the time of definition. Note that it is still perfectly permissible to have no constraints on a type; that just means that you can only perform operations that work for all types (such as manipulate pointers to values of that type) in the body of the function.

Dependent names

A name is said to be dependent if it depends on some checked or template parameter. Note: this matches the use of the term "dependent" in C++, not as in dependent types.

Definition checking

Definition checking is the process of semantically checking the definition of parameterized code for correctness independently of any particular argument values. It includes type checking and other semantic checks. It is possible, even with templates, to check semantics of expressions that are not dependent on any template parameter in the definition. Adding constraints to template parameters and/or switching them to be checked allows the compiler to increase how much of the definition can be checked. Any remaining checks are delayed until instantiation, which can fail.

Complete definition checking

Complete definition checking is when the definition can be fully semantically checked, including type checking. It is an especially useful property because it enables separate semantic checking of the definition, a prerequisite to separate compilation. It also is a requirement for implementation strategies that don’t instantiate the implementation (for example, type erasure or dynamic-dispatch witness tables).

Early versus late type checking

Early type checking is where expressions and statements are type checked when the definition of the function body is compiled, as part of definition checking. This occurs for regular and checked-generic values.

Late type checking is where expressions and statements may only be fully typechecked once calling information is known. Late type checking delays complete definition checking. This occurs for template-dependent values.

Bindings

Binding patterns associate a name with a type and a value. This is used to declare function parameters, in let and var declarations, as well as to declare compile-time parameters for classes, interfaces, and so on. There are three kinds of binding patterns, corresponding to the three expression phases:

  • A runtime binding pattern binds to a dynamic value at runtime, and is written using a :, as in x: i32.
  • A symbolic binding pattern binds to a compile-time value that is not known when type checking, and is used to declare checked generic parameters. These binding use :!, as in T:! type.
  • A template binding pattern binds to a compile-time value that is known when type checking, and is used to declare template parameters. These bindings use the keyword template in addition to :!, as in template T:! type.

The last two binding patterns, which are about binding a compile-time value, are called compile-time binding patterns, and correspond to those binding patterns that use :!.

The name being declared, which is the identifier to the left of the : or :!, is called a binding, or more specifically a runtime binding, compile-time binding, symbolic binding, or template binding. The expression to the right defining the type of the binding pattern is called the binding type expression, a kind of type expression. For example, in T:! Hashable, T is the binding (a symbolic binding in this case), and Hashable is the binding type expression.

Types and type

A type is a value of type type. Conversely, type is the type of all types.

Expressions in type position, for example a binding type expression or the return type of a function, are implicitly cast to type type. This means that it is legal to put a value that is not a type where a type is expected, as long as it has an implicit conversion to type that may be performed at compile time.

Facet type

TODO: Documents using the obsolete term "type-of-type" should be updated to say "facet type" instead.

A facet type is a type whose values are some subset of the values of type, determined by a set of type constraints:

  • Interfaces and named constraints are facet types whose constraints are that the interface or named constraint is satisfied by the type.
  • The values produced by & operations between facet types and by where expressions are facet types, whose set of constraints are determined by the & or where expression.
  • type is a facet type whose set of constraints is empty.

A facet type is the type used when declaring some type parameter. It foremost determines which types are legal arguments for that type parameter. For template parameters, that is all a facet type does. For checked parameters, it also determines the API that is available in the body of the definition of the generic function, class, or other entity.

Facet

A facet is a value of a facet type. For example, i32 as Hashable is a facet, and Hashable is a facet type. Note that all types are facets, since type is considered a facet type. Not all facets are types, though: i32 as Hashable is of type Hashable not type, so it is a facet that is not a type. However, in places where a type is expected, for example in a binding type expression or after the -> in a function declaration, there is an automatic implicit conversion to type. This means that a facet may be used in those positions. For example, the facet i32 as Hashable will implicitly convert to (i32 as Hashable) as type, which is i32, in those contexts.

Type expression

A type expression is an expression that can be used as a type. In some cases, what is written in the source code is a value, like a facet or tuple of types, that is not a type but has an implicit conversion to type. In those cases, we are concerned with the type value after the implicit conversion.

Facet binding

We use the term facet binding to refer to the name introduced by a compile-time binding pattern (using :! with or without the template modifier) where the declared type is a facet type. In the binding pattern T:! Hashable, T is a facet binding, and the value of T is a facet.

Deduced parameter

A deduced parameter is listed in the optional [ ] section right after the function name in a function signature:

fn <name> [ <deduced parameters> ]( <explicit parameters> ) -> <return type>

Deduced arguments are determined as a result of pattern matching the explicit argument values (usually the types of those values) to the explicit parameters.

See more here.

Interface

An interface is an API constraint used in a function signature to provide encapsulation. Encapsulation here means that callers of the function only need to know about the interface requirements to call the function, not anything about the implementation of the function body, and the compiler can check the function body without knowing anything more about the caller. Callers of the function provide a value that has an implementation of the API and the body of the function may then use that API. In the case of a checked generic, the function may only use that API.

Structural interfaces

A "structural" interface is one where we say a type satisfies the interface as long as it has members with a specific list of names, and for each name it must have some type or signature. A type can satisfy a structural interface without ever naming that interface, just by virtue of having members with the right form.

Nominal interfaces

A "nominal" interface is one where we say a type can only satisfy an interface if there is some explicit statement saying so, for example by defining an impl. This allows "satisfies the interface" to have additional semantic meaning beyond what is directly checkable by the compiler. For example, knowing whether the Draw function means "render an image to the screen" or "take a card from the top of a deck of cards"; or that a + operator is commutative (and not, say, string concatenation).

We use the "structural" versus "nominal" terminology as a generalization of the same terms being used in a subtyping context.

Named constraints

Named constraints are "structural" in the sense that they match a type based on meeting some criteria rather than an explicit statement in the type's definition. The criteria for a named constraint, however, are less focused on the type's API and instead might include a set of nominal interfaces that the type must implement and constraints on the associated entities and interface parameters.

Associated entity

An associated entity is a requirement in an interface that a type's implementation of the interface must satisfy by having a matching definition. A requirement that the type define a value for a member constant is called an associated constant. If the type of the associated constant is a facet type, then it is called an associated facet, which corresponds to what is called an "associated type" in other languages (Swift, Rust). Similarly, an interface can have associated function, associated method, or associated class function.

Different types can satisfy an interface with different definitions for a given member. These definitions are associated with what type is implementing the interface. An impl defines what is associated with the type for that interface.

Rust uses the term "associated item" instead of associated entity.

Impl: Implementation of an interface

An impl is an implementation of an interface for a specific type, called the implementing type. It is the place where the function bodies are defined, values for associated constants, etc. are given. Implementations are needed for nominal interfaces; structural interfaces and named constraints define conformance implicitly instead of by requiring an impl to be defined. In can still make sense to explicitly implement a named constraint as a way to implement all of the interfaces it requires.

Extending an impl

A type that extends the implementation of an interface has all the named members of the interface as named members of the type. This means that the members of the interface are available by way of both simple member access and qualified member access expressions.

If a type implements an interface without extending, the members of the interface may only be accessed using qualified member access expressions.

Member access

There are two different kinds of member access: simple and compound. See the member access design document for the details. The application to generics combines compound member access with qualified names, which we call a qualified member access expression.

Simple member access

Simple member access has the from object.member, where member is a word naming a member of object. This form may be used to access members of interfaces when the type of object extends the implementation of that interface.

If String extends its implementation of Printable, then s1.Print() calls the Print method of Printable using simple member access. In this case, the name Print is used without qualifying it with the name of the interface it is a member of since it is recognized as a member of the type itself as well.

Qualified member access expression

Compound member access has the form object.(expression), where expression is resolved in the containing scope. A compound member access where the member expression is a simple member access expression, as in a.(context.b), is called a qualified member access expression. The member expression context.b may be the qualified member name of an interface member, that consists of the name of the interface, possibly qualified with a package or namespace name, a dot . and the name of the member.

For example, if the Comparable interface has a Less member method, then the qualified name of that member is Comparable.Less. So if String implements Comparable, and s1 and s2 are variables of type String, then the Less method may be called using the qualified member name by writing the qualified member access expression s1.(Comparable.Less)(s2).

This form may be used to access any member of an interface implemented for a type, whether or not it extends the implementation.

Compatible types

Two types are compatible if they have the same notional set of values and represent those values in the same way, even if they expose different APIs. The representation of a type describes how the values of that type are represented as a sequence of bits in memory. The set of values of a type includes properties that the compiler can't directly see, such as invariants that the type maintains.

We can't just say two types are compatible based on structural reasons. Instead, we have specific constructs that create compatible types from existing types in ways that encourage preserving the programmer's intended semantics and invariants, such as implementing the API of the new type by calling (public) methods of the original API, instead of accessing any private implementation details.

Subtyping and casting

Both subtyping and casting are different names for changing the type of a value to a compatible type.

Subtyping is a relationship between two types where you can safely operate on a value of one type using a variable of another. For example, using C++'s object-oriented features, you can operate on a value of a derived class using a pointer to the base class. In most cases, you can pass a more specific type to a function that can handle a more general type. Return types work the opposite way, a function can return a more specific type to a caller prepared to handle a more general type. This determines how function signatures can change from base class to derived class, see covariance and contravariance in Wikipedia.

In a generics context, we are specifically interested in the subtyping relationships between facet types. In particular, a facet type encompasses a set of type constraints, and you can convert a type from a more-restrictive facet type to another facet type whose constraints are implied by the first. C++ concepts terminology uses the term "subsumes" to talk about this partial ordering of constraints, but we avoid that term since it is at odds with the use of the term in object-oriented subtyping terminology.

Note that subtyping is a bit like coercion, except we want to make it clear that the data representation of the value is not changing, just its type as reflected in the API available to manipulate the value.

Casting is indicated explicitly by way of some syntax in the source code. You might use a cast to switch between type adaptations, or to be explicit where an implicit conversion would otherwise occur. For now, we are saying "x as y" is the provisional syntax in Carbon for casting the value x to the type y. Note that outside of generics, the term "casting" includes any explicit type change, including those that change the data representation.

In contexts where an expression of one type is provided and a different type is required, an implicit conversion is performed if it is considered safe to do so. Such an implicit conversion, if permitted, always has the same meaning as an explicit cast.

Coherence

A generics or interface system has the implementation coherence property, or simply coherence, if there is a single answer to the question "what is the implementation of this interface for this type, if any?" independent of context, such as the libraries imported into a given file.

This is enforced using two kinds of rules:

  • An orphan rule is a restriction on which files may declare a particular implementation. This is to ensure that the implementation is imported any time it could be used. For example, if neither the type nor the interface is parameterized, the orphan rule requires that the implementation must be in the same library as the interface or type. The rule is we don't allow an orphan implementation that is not with either of its parents (parent type or parent interface).
  • An overlap rule is a way to consistently select a single implementation when multiple implementations apply. In Carbon, overlap is resolved by picking a single implementation using a rule that picks the one that is considered most specialized. In Rust, by contrast, the overlap rule or overlap check instead produces an error if two implementations apply at once.

Adapting a type

A type can be adapted by creating a new type that is compatible with an existing type, but has a different API. In particular, the new type might implement different interfaces or provide different implementations of the same interfaces.

Unlike extending a type (as with C++ class inheritance), you are not allowed to add new data fields onto the end of the representation -- you may only change the API. This means that it is safe to cast a value between those two types without any dynamic checks or danger of object slicing.

This is called "newtype" in Rust, and is used for capturing additional information in types to improve type safety by moving some checking to compile time (1, 2, 3) and as a workaround for Rust's orphan rules for coherence.

Type erasure

"Type erasure" is where a type's API is replaced by a subset. Everything outside of the preserved subset is said to have been "erased". This can happen in a variety of contexts including both checked generics and runtime polymorphism. For checked generics, type erasure restricts a type to just the API required by the constraints on that type stated in the signature of the function.

An example of type erasure in runtime polymorphism in C++ is casting from a pointer of a derived type to a pointer to an abstract base type. Only the API of the base type is available on the result, even though the implementation of those methods still come from the derived type.

The term "type erasure" can also refer to the specific strategy used by Java to implement generics. which includes erasing the identity of type parameters. This is not the meaning of "type erasure" used in Carbon.

Archetype

A placeholder type is used when type checking a function in place of a generic type parameter. This allows type checking when the specific type to be used is not known at type-checking time. The type satisfies just its constraint and no more, so it acts as the most general type satisfying the interface. In this way the archetype is the supertype of all types satisfying the interface.

In addition to satisfying all the requirements of its constraint, the archetype also has the member names of its constraint. Effectively it is considered to extend the implementation of the constraint.

Extending an interface

An interface can be extended by defining an interface that includes the full API of another interface, plus some additional API. Types implementing the extended interface should automatically be considered to have implemented the narrower interface.

Witness tables

Witness tables are an implementation strategy where values passed to a generic type parameter are compiled into a table of required functionality. That table is then filled in for a given passed-in type with references to the implementation on the original type. The generic is implemented using calls into entries in the witness table, which turn into calls to the original type. This doesn't necessarily imply a runtime indirection: it may be a purely compile-time separation of concerns. However, it insists on a full abstraction boundary between the generic user of a type and the concrete implementation.

A simple way to imagine a witness table is as a struct of function pointers, one per method in the interface. However, in practice, it's more complex because it must model things like associated facets and interfaces.

Witness tables are called "dictionary passing" in Haskell. Outside of generics, a vtable is a witness table that witnesses that a class is a descendant of an abstract base class, and is passed as part of the object instead of separately.

Dynamic-dispatch witness table

For dynamic-dispatch witness tables, actual function pointers are formed and used as a dynamic, runtime indirection. As a result, the generic code will not be duplicated for different witness tables.

Static-dispatch witness table

For static-dispatch witness tables, the implementation is required to collapse the table indirections at compile time. As a result, the generic code will be duplicated for different witness tables.

Static-dispatch may be implemented as a performance optimization for dynamic-dispatch that increases generated code size. The final compiled output may not retain the witness table.

Instantiation

Instantiation is the implementation strategy for templates in both C++ and Carbon. Instantiation explicitly creates a copy of the template code and replaces the template components with the concrete type and its implementation operations. It allows duck typing and lazy binding. Instantiation implies template code will be duplicated.

Unlike static-dispatch witness tables and monomorphization (as in Rust), this is done before type checking completes. Only when the template is used with a concrete type is the template fully type checked, and it type checks against the actual concrete type after substituting it into the template. This means that different instantiations may interpret the same construct in different ways, and that templates can include constructs that are not valid for some possible instantiations. However, it also means that some errors in the template implementation may not produce errors until the instantiation occurs, and other errors may only happen for some instantiations.

Specialization

Template specialization

Specialization in C++ is essentially overloading, or ad-hoc polymorphism, in the context of a template. The template is overloaded to have a different definition for some subset of the possible template argument values. For example, the C++ type std::vector<T> might have a specialization std::vector<T*> that is implemented in terms of std::vector<void*> to reduce code size. In C++, even the interface of a templated type can be changed in a specialization, as happens for std::vector<bool>.

Checked-generic specialization

Specialization of checked generics, or types used by checked generics, is restricted to changing the implementation without affecting the interface. This restriction is needed to preserve the ability to perform type checking of generic definitions that reference a type that can be specialized, without statically knowing which specialization will be used.

While there is nothing fundamentally incompatible about specialization with checked generics, even when implemented using witness tables, the result may be surprising because the selection of the specialized generic happens outside of the witness-table-based indirection between the generic code and the concrete implementation. Provided all selection relies exclusively on interfaces, this still satisfies the fundamental constraints of generics.

Conditional conformance

Conditional conformance is when you have a parameterized type that has one API that it always supports, but satisfies additional interfaces under some conditions on the type argument. For example: Array(T) might implement Comparable if T itself implements Comparable, using lexicographical order.

Interface parameters and associated constants

Interface parameters and associated constants are both ways of allowing the types in function signatures in an interface to vary. For example, different stacks will have different element types. That element type would be used as the parameter type of the Push function and the return type of the Pop function. As in Rust, we can distinguish these by whether they are input parameters or output parameters:

  • An interface parameter is a parameter or input to the interface. That means they must be specified before an implementation of the interface may be determined.
  • In contrast, associated constants are outputs. This means that they are determined by the implementation, and need not be specified in a type constraint.

Functions using an interface as a constraint need not specify the value of its associated constants.

// Stack using associated facets
interface Stack {
  let ElementType:! type;
  fn Push[addr self: Self*](value: ElementType);
  fn Pop[addr self: Self*]() -> ElementType;
}

// Works on any type implementing `Stack`. Return type
// is determined by the type's implementation of `Stack`.
fn PeekAtTopOfStack[T:! Stack](s: T*) -> T.ElementType {
  let ret: T.ElementType = s->Pop();
  s->Push(ret);
  return ret;
}

class Fruit;
class FruitStack {
  // Implement `Stack` for `FruitStack`
  // with `ElementType` set to `Fruit`.
  extend impl as Stack where .ElementType == Fruit { ... }
}

Associated constants are particularly called for when the implementation of the interface determines the value, not the caller. For example, the iterator type for a container is specific to the container and not something you would expect a user of the interface to specify.

If you have an interface with parameters, a type can have multiple matching impl declarations for different combinations of argument values. As a result, interface parameters may not be deduced in a function call. However, if the interface parameters are specified, a type can only have a single implementation of the given interface. This unique implementation choice determines the values of associated constants.

For example, we might have an interface that says how to perform addition with another type:

interface AddWith(T:! type) {
  let ResultType:! type;
  fn Add[self: Self](rhs: T) -> ResultType;
}

An i32 value might support addition with i32, u16, and f64 values.

impl i32 as AddWith(i32) where .ResultType = i32 { ... }
impl i32 as AddWith(u16) where .ResultType = i32 { ... }
impl i32 as AddWith(f64) where .ResultType = f64 { ... }

To write a generic function requiring a parameter to be AddWith, there needs to be some way to determine the type to add to:

// ✅ This is allowed, since the value of `T` is determined by the
// `y` parameter.
fn DoAdd[T:! type, U:! AddWith(T)](x: U, y: T) -> U.ResultType {
  return x.Add(y);
}

// ❌ This is forbidden, can't uniquely determine `T`.
fn CompileError[T:! type, U:! AddWith(T)](x: U) -> T;

Once the interface parameters can be determined, that determines the values for associated constants, such as ResultType in the example. As always, calls with types for which no implementation exists will be rejected at the call site:

// ❌ This is forbidden, no implementation of `AddWith(Orange)`
// for `Apple`.
DoAdd(apple, orange);

The type of an interface parameters and associated constants is commonly a facet type, but not always. For example, an interface parameter that specifies an array bound might have an integer type.

Type constraints

Type constraints restrict which types are legal for a facet binding, like a facet parameter or associated facet. They help define semantics under which they should be called, and prevent incorrect calls.

In general there are a number of different type relationships we would like to express, for example:

  • This function accepts two containers. The container types may be different, but the element types need to match.
  • For this container interface we have associated facets for iterators and elements. The iterator type's element type needs to match the container's element type.
  • An interface may define an associated facet that needs to be constrained to implement some interfaces.
  • This type must be compatible with another type. You might use this to define alternate implementations of a single interfaces, such as sorting order, for a single type.

Note that type constraints can be a restriction on one facet parameter or associated facet, or can define a relationship between multiple facets.

References