-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] heterogeneous Irmin stores via effectful optics #909
Comments
That looks great! For extra clarity, it would be very useful if you could show how the two first examples can be encoded in this new scheme and what the API will look like for the end-user. |
@samoht: me know if you want to see extra detail in either of these two regards, or have thoughts about the FS layout API 🙂 |
I need to think more about this but just a few quick remarks:
Also, on a more general design level, I am not sure if we need to expose the lens and optic types to the end-user. I'd like to avoid having to explain to them what are lens/optics and how they work. I feel that the current definition are specialised enough that we could just fold them into the Tree signature (e.g. have Lens == Tree++) or something similar. What do you think? |
Quite 🙂 I was imagining providing both but gave the more general one in my example. I'll update the RFC to reflect this.
I haven't thought about this properly yet, which is why I chose to use the PPX for that example 😉 What we need is a sum where the cases are mapped onto disjoint sets of filenames. As a very rough initial guess, we could have something like: type file =
| Markdown of markdown
| Text of text
let file, Optic.[ markdown; text ] =
variant "desert_store" (fun markdown text -> function
| Markdown m -> markdown m
| Text t -> text t)
|~ case1 "markdown" markdown (fun m -> Markdown m)
|~ case1 "text" text (fun t -> Text t)
|> sealv_prism
|> directory (function
| `Markdown m -> m ^ ".md"
| `Text t -> t ^ ".txt") where val file : (file directory) Irmin.Type.t
val markdown : string -> (file directory, markdown) prism
val text : string -> (file directory, text) prism We could then access a particular file with: I suspect nicer solutions are possible with new sum type generic combinators specialised for this particular use-case. It seems likely to me that several backends would expose such additional combinators in order to achieve particular store structure, where this structure is visible to the user (FS, GraphQL etc.).
Yes, the idea is to allow arbitrary nesting of different node types in the general case (and then have individual backends impose restrictions on this when necessary). That schema could be encoded with something like. let cactus_dir, Optic.[ x; y ] =
variant "cactus" (fun x y -> function | X a -> x a | Y a -> y a)
|~ case1 "x" cactus_x (fun a -> X a)
|~ case1 "y" cactus_y (fun a -> Y a)
|> sealv_prism
|> directory (function
| `Markdown m -> m ^ ".x"
| `Text t -> t ^ ".y")
let desert_store, Lens.[ camels; cacti ] =
record "desert_store" (fun dromedary bacrian -> { dromedary; bactrian })
|+ field "camels" camel_store (fun s -> s.dromedary)
|+ field "cacti" cactus_dir (fun s -> s.bactrian)
|> sealr_lens
I'd like to avoid requiring an understanding of optics too, and think it's fairly easy to do so with some clever placement. The optics we need could be included in e.g. However, I think we should still expose the generic optics in e.g. |
Having read up a bit on optics I now understand better your proposal, I also think it's great ! If I understand correctly, you need this monad to compose optics? Does it mean that you cannot access |
I appreciate work on this topic. My usage of irmin is usually in the shape described above:
IIUC, the proposal above takes the first two use cases into consideration. Do you have any thoughts about the third use case? |
If you know all the keys, is it possible to enumerate them all to create a big record and use lenses? Would be great to have dynamic typed keys but I'am not sure how that could be typable. Or we need some kind of extensible records... |
I could create a |
@ioana: This section of my proposal is not particularly well explained (I'm not sure what exactly to call this pattern). There are two 'effects' at play here:
(* Monad of effects in the heap *)
type +'a io
val return : 'a -> 'a io
val bind : 'a io -> ('a -> 'b io) -> 'b io
(* ~Comonad to capture indirection via hashing/dereferencing *)
type +'a addr
val deref : 'a addr -> 'a io
val update : 'a addr -> ('a -> 'b) -> 'b addr io The point of bringing this up in the proposal is that the optics library will need to be parameterised over both internal and external effects in this way. For instance, we want to be able to compose lenses that have an internal val ( / ) : ('a, 'b, 'c addr, 'd addr) lens ->
('c, 'd, 'e addr, 'f addr) lens ->
('a, 'b, 'e addr, 'f addr) lens which requires parameterisation over the val view : ('s, 't, 'a addr, 'b addr) t -> 's -> 'a io There are examples of this sort of 'optics + effects' API from the Haskell world (https://hackage.haskell.org/package/lens-action-0.2.4/docs/Control-Lens-Action.html#t:Effective) but I don't know of any in OCaml (it's going to be syntactically heavyweight in OCaml because of the need for functors for higher-kinded polymorphism). |
@hannesm. For the third use-case, this scheme would encode that with a large product type as @samoht suggests. I intend to restrict the scope of heterogeneous stores to only statically known type structure, to avoid needing to reason about 'unsealed' extensible type witnesses in the Irmin core (I already find these difficult enough to work with when building the generics themselves). Not to say that extensible products/sums couldn't be provided; only that I think it introduces more complexity than could reasonably be implemented in a single iteration.
I'm not sure exactly what you mean here. The idea is that 'record' tree objects would have the same underlying implementation as standard homogeneous tree objects with |
thanks for explanation, I'm looking forward to adjust my code using optics. |
I was looking for the recipe of how to do this... I have a store where certain key paths should be treated as JSON with a known structure (and custom So +1 on this RFC proposal But my question is about the current workaround: type blob = Dromedary of dromedary | Bactrian of bactrian | Cactus of cactus
[@@deriving irmin]
let get_spike () =
Store.get s [ "cacti"; "dangerous"; "spike" ] >|= function
| Cactus c -> c
| _ -> assert false (* never occurs in a well-formed application *)
let set_camelus c =
Store.set ~info s [ "camels"; "bactrian"; "camelus" ] (Bactrian c)
(* must tag with `Bactrian` and promise never to give the wrong tag *) I can see how this works for get and set, but how does |
This is a mirror of a document available at CraigFe/ocaml-generics.
RFC - Heterogeneous Irmin stores via Effectful Optics
Problems with Irmin at present
Irmin is parameterised over a user-supplied data type to be used for
representing leaves (blobs) of the store. The blob type
b
must be passed with:b Type.t
, defined by hand via theIrmin.Type
combinators or via
ppx_irmin
, to allow Irmin to derive its ownpretty-printing, hashing and serialisation functions for
b
.b option Merge.t
used to resolve blob-level conflicts.With this API, all blobs in the store must have the same OCaml representation.
This leads to unpleasant interactions when attempting to store different types
of data in different regions of the store. In particular, we have observed two
different issues:
Application constraints quite often lead to certain regions of the Irmin store
having pre-determined type:
With the current API, the only thing to be done is to make a tagged union and
assert false
in the appropriate places:This works, but has three issues
opportunities for sharing. For example,
Dromedary
andBactrian
may havethe same run-time representation, but with this API they can never share blobs
in the content-addressable heap.
The type of a blob may also be determined by a suffix of the path used to
access it, for example:
Again, we can get around this at the application level by using a sum type for
the blobs:
but the three above are still prevalent. In this particular example, we might
particularly care about introducing serialised tags, since it would cause the
.md
files to no longer be parse-able as Markdown files when using afile-system backend for Irmin.
Proposed solution
Optical get/set API
A better API to solve problem #1 might look like the following:
where
cacti
,camels
,bactrian
are some form of 'first-class projection'that can be used to get/set a sub-component of a complex type. Such objects are
typically referred to as optics. In this case, the 'complex' type is
store as a product type defined by:
where
'b tree
is a standard Irmin tree object with a single blob type'b
andindexed by a sequence of 'steps' (optics over product types are known as
'lenses').
Similarly, we can imagine a better API for the file-system case:
This is effectively the same as what we were doing before, with two differences:
the sum type has been reified into the Irmin store (meaning that the
serialised blob need not contain a tag);
the assertion of the file's type is combined with the assertion that it exists
at the location
files/<name>.md
(saving some boilerplate code). In thiscase, the
md
value is a 'prism' (optic for sum types) that provides a sumprojection (i.e.
file -> markdown option
).Types of tree node
To generalise from these examples, the proposal is to generalise from having the
user define 'blob' types to defining the type structure of the entire store. We
will have three types of tree object, which (might) be nested arbitrarily inside
each other:
homogeneous tree with unbounded children per branch
(
'a tree -> string -> 'a tree
). This will be the underlying representationfor every store (glossing over some details about serialising tags for
non-suffix sum types). To the user, it appears like an abstract
tree
producttype with a pre-defined type combinator and lens constructor:
product nodes, defining a fixed number of components (via
Irmin.Type.record
) and accessed via lenses.sum nodes, defining a fixed number of cases (via
Irmin.Type.variant
) andaccessed via prisms.
This has the advantage of easily reproducing the old behaviour as
b tree
whilealso allowing arbitrary algebraic type structure to be encoded into the Irmin
store via generics. The old API can be preserved as a simplified
Contents
functor that uses
b tree
as the store type internally and hides all uses ofoptics.
Monad-parameterised optics
A key advantage of using optics as an Irmin interface is to represent the
indirection of the content-addressable store: accessing/updating the field of a
'store'-d record requires access to a content-addressable heap that may have its
own effect monad. We can provide an optics library that is parameterised over
these effects, and then specialise them for Irmin:
There are many such constructions that achieve the same expressivity: if anyone
has an idea of a principled way of selecting one, please let me know
🙂
Generic construction of optics
It is possible to derive the necessary lenses and prisms with generics, avoiding
introducing any additional boilerplate:
Another (more extensive) option might be to provide optic construction for
arbitrary generics.
Backend-imposed restrictions
Certain backends might want to impose restrictions / conventions about the type
structure contained in the store in ways that make sense for that specific data
representation. For example, the Git-compatible backends could restrict sum
types to a dedicated 'extension' lens that may only be applied at the end of
the path.
The solution should be functorised in such a way as to allow for this use-case.
The big picture
Disregarding backend-specific specialisations, the new API might look something
like this:
We need to provide a new API for homogeneous trees too:
Advantages
feature that has been under discussion for inclusion in Irmin
(Add Irmin.Type converter to concrete trees #478): the ability to define
'structured' blob types (where internal components can be shared across the
content-addressable heap).
Limitations
runtime overhead due to generics. Extracting a blob from a tree object
will now require generics, making it slower (except in the case where only
primitive nodes are used). As always with generics, we might expect
significant performance improvements with the use of multi-stage programming.
unable to precisely determine when access is partial. This API introduces
something new: certain accesses of an Irmin store may be guaranteed to
succeed, since the path from root to blob goes via no sum types. The lens API
has no way to track the totality of accesses (without more powerful type-level
programming features made possible by Track the purity of expressions to avoid the value restriction ocaml/RFCs#5).
The text was updated successfully, but these errors were encountered: