This repo is a backend for the categorifier plugin, with the purpose of compiling Haskell to C. It contains a cartesian closed category, referred to as CCat, as well as code that converts morphisms in this category into C code.
At a high level, a Haskell function is compiled into a C function via the following steps:
- The categorifier plugin (i.e., frontend) maps the Hask morphism (i.e., the Haskell function)
into a CCat morphism.
- Hask is, roughly speaking, a category where objects are Haskell types and morphisms are Haskell functions.
- CCat is a category where objects are Haskell types, and a morphism from
A
toB
is a Haskell function fromTargetOb A
toTargetOb B
. The frontend is therefore a functor from Hask to CCat, whose action on objects is identity.
- The backend converts
TargetOb A -> TargetOb B
into a C function.
The main advantages of compiling Haskell into C this way, compared to the common alternative approach of using an embedded DSL, are:
- It allows the source Haskell code to be normal, vanilla Haskell. An eDSL-based
approach usually imposes many restrictions on how the source code is written
(for example, many functions and operators in
base
cannot be used). - It can potentially generate more readable C code. This is WIP, but a compiler plugin, being part of the compilation pipeline, has convenient access to information such as variable and function names, module names, function boundaries and so on, which can be taken advantage of to produce readable C code.
- It generates smaller and more efficient C code. We've compared the plugin and an eDSL based approach on the same Haskell code, and the plugin produces slightly smaller C code due to certain optimizations it has the opportunities to perform.
On the flip side, using a compiler plugin is more fragile than an eDSL-based approach, and it is harder to support multiple GHC versions. Also, it is harder to give a precise, formal description of the subset of Haskell that is supported. That being said, this approach has been tested extensively at Kittyhawk on its flight controller codebase. The stability turns out to be quite good, and the ability for control engineers to write normal Haskell is highly appreciated.
The readers are encouraged to refer to the examples to follow along this document.
To be able to compile a haskell function f :: Input -> Output
into a C function, the
following conditions must be met:
f
is monomorphic (or a polymorphic function whose type variables are all instantiated)Input
andOutput
have the following instances:Categorifier.C.CExpr.Cat.TargetOb.TargetOb
Categorifier.C.CTypes.CGeneric.Class.CGeneric
Categorifier.C.CTypes.GArrays.GArrays
Categorifier.Client.HasRep
GHC.Generics.Generic
Currently, these instances can be defined for the following types:
- Primitive types. All primitive types except
Bool
need to be wrapped inCategorifier.C.KTypes.C
. Specifically, the primitive types areBool
,C Int8
,C Int16
,C Int32
,C Int64
,C Word8
,C Word16
,C Word32
,C Word64
,C Float
, andC Double
.- The
C
here is a newtype wrapper. The reason most primitive types need to be wrapped is because some of their default Haskell instances behave differently than the corresponding C functions. As an example, in Haksell,min @Double 0.0 (-0.0) == 0.0
, while in C,fmin(0.0, -0.0) == -0.0
.
- The
- Product types where each field is a type that has the above instances.
- Enum types (i.e., sum types whose data constructors are all nullary) represented by
KEnum
. Maybe a
types represented byKMaybe
, wherea
has the above instances.
See examples/sum-types for an example of using KEnum
and KMaybe
.
At present, the above instances (except HasRep
and TargetOb
) cannot be written for
arbitrary sum types. Supporting non-recursive sum types is work-in-progress, while supporting
recursive sum types such as lists would require much greater effort.
The internal types involved in the definition of f
are not subject to these constraints.
For example, the body of f
may use lists or other sum types which don't have all of the above
instances. There are still some constraints though: the internal types are generally required to
have at least HasRep
and TargetOb
instances defined; and types used in branches
(e.g., T
in if b then x else y :: T
) need to have ConCat.Category.IfCat
instances.
Instead of compiling a Haskell function into one big C function, it is possible to produce
multiple smaller C functions. This can be done by wrapping each part for which a separate
C function is desired with kFunctionCall
.
For example, suppose the top-level function being compiled is f
, which calls g
:
f :: InputF -> OutputF
f = ... g ...
where
g :: InputG -> OutputG
g = <body>
To generate a separate function for g
, just replace g = <body>
with
g = kFunctionCall (Proxy @C) "separate_function_name" <body>
.
Note that this imposes additional restrictions on InputG
and OutputG
. They must
now meet the aforementioned criteria of supported types. Previously they are considered
internal types, and do not need to meet those criteria.
Doing so can reduce the compilation time of the generated C code, since compiling large C functions can be very slow.
An example can be found at examples/multiple-c-functions.
Compiling large functions is slow and memory intensive in most compiled languages.
The same holds true for categorifying large function using categorifier. It is
thus often desirable to categorify a large function in
smaller chunks. To do so, use Categorifier.Categorify.function
to categorify
the subparts. Each call to function
creates a Categorifier.Category.NativeCat
instance. For example, calling function
on function f :: InputF -> OutputF
in
module M
produces instance NativeCat C.Cat "M.f" InputF OutputF
, which contains
a method nativeK :: C.Cat InputF OutputF
. Then, if the plugin encounters M.f
, it
does not need to inline it, but can directly replace it with nativeK
.
An example can be found at examples/separate-categorification.
The Categorifier plugin takes as input a function that attempts to automatically interpret Haskell functions.
Recall that Categorifier transforms a Haskell function of type A -> B
(i.e., an arrow in the Hask category)
into an arrow in the target category, i.e., A `k` B
for some k
. If both A -> B
and A `k` B
are
instantiations of the same polymorphic function, then no work is needed to categorify it.
This is called automatic interpretation.
When using categorifier-c, k ~ Categorifier.C.CExpr.Cat.Cat
. Cat
is a newtype wrapper, and A `Cat` B
is isomorphic to TargetOb A -> TargetOb B
. Thus a function A -> B
can be auto-interpreted if A -> B
and TargetOb A -> TargetOb B
are the instantiations of the same polymorphic function.
An example can be found at examples/auto-interpret. In this example, the type of
function F.g
is g :: KType1 f => f Int32 -> f Word64
. We need to categorify g @C
, i.e., C Int 32 -> C Int64
,
into
TargetOb (C Int32) -> TargetOb (C Int64)
~ CExpr Int32 -> CExpr Int64
Both C
and CExpr
have KType1
instances, which means what we need is simply g @CExpr
. Because
g @C
and g @CExpr
are instantiations of the same polymorphic function g
, no compilation
is needed. The body of g
contains IO actions and unsafePerformIO
, which Categorifier can't normally compile,
but in this case it doesn't matter, because g
does not need to be inlined.
Another example of a function that can be automatically interpreted is
Linear.Vector.^+^ :: Num a => f a -> f a -> f a
Since TargetOb (Linear.V2.V2 a) = Linear.V2.V2 (TargetOb a)
, the result of categorifying (^+^) @V2 @(C Int64)
is simply (^+^) @V2 @(CExpr Int64)
, and again, no compilation is needed.
To enable automatic interpretation, we need to pass an auto-interpreter corresponding to the target category
to the Categorifier plugin. An auto-interpret for the Categorifier.C.CExpr.Cat.Cat
category is
Categorifier.C.UnconCat.tryAutoInterpret
. To use this auto-interpreter, add the following GHC option:
-fplugin-opt Categorifier:autointerpreter:Categorifier.C.UnconCat.tryAutoInterpret
Automatic interpretation and the aforementioned separate categorification share a similar idea, but serve
different purposes. Automatic interpretation is done automatically for eligible functions, does not require
Categorifier to compile the body of the functions, and can be applied to non-exported functions.
Separate categorization is more widely applicable - it does not require
A -> B
and A `k` B
to be instantiations of the same polymorphic function, but one needs to
explicitly categorify each subpart, and thus it cannot be used on non-exported functions.
Coming soon
It is possible to generate C code that calls external C/C++ functions. See examples/external-c-functions.
There are compatible direnv and Nix environments in the repo to make it easy to build, test, etc. everything with consistent versions to help replicate issues.
This repo is all formatted using Ormolu. Currently CI runs Ormolu 0.4.0.0, which can be installed by cabal install ormolu-0.4.0.0
. See the usage notes for how to best integrate it with your workflow. But don't let Ormolu get in the way of contributing - CI will catch the formatting, and we can help clean up anything.