-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance advice for dhall-kubernetes based package #1890
Comments
@ggilmore: I'm still investigating, but I'll report what I know so far and various avenues that I'm exploring. First, make sure that you are using a newer version of For reference, here is the timing I get when I
For future reference, the only expression that you need to freeze for benchmarking purposes is the top-most one since no transitive expressions are resolved in the event of a cache hit. I suspect (but have not confirmed) that most of the time spent is on decoding the cached expression, which is 50 MB:
... which means a decoding speed of ~4 MB/s. There are two main avenues that I'm exploring:
To answer your two initial questions:
|
Hey @Gabriel439 thanks for looking into this! A few responses:
|
@ggilmore I am so happy not to be an isolated soul on this planet ;-) I have been so confused around the topic of performance/freeze that I have opened this discourse discussion. IIRC at the time I did realized it was pointless to freeze everything. It has been quite a surprise mainly because all the generated dhall-packages repositories that I know about (dhall-kubernetes, dhall-openshift, dhall-argocd ) seems to be sending the wrong message on that regard. At the end of the day when you need to integrate these 3 different generated packages, you do realized the current state is not a straightforward journey (see EarnestResearch/dhall-packages#62). While using a fork as a workaround this kind of code will be on your way. To contribute in a positive way, I guess a generic api generator is the way to go but you might not have the time/opportunities to do so. @Gabriel439 Thank you so much for looking at this. |
I plan to document more about freezing idioms and when to use them in the Dhall manual but I need to finish fixing a few warts in the import UX that I ran into while documenting things. I can give a quick summary here, though:
|
A semi-advanced exampleThe following code takes ~2.5-3 seconds to evaluate with This function gets imported and used in a few other modules, massively increasing the time for those modules to evaluate. let Container/withSecrets
: kubernetes.Container.Type -> kubernetes.Container.Type
= \(container : kubernetes.Container.Type) ->
container
// { volumeMounts = Some
( merge
{ Some =
\(volumeMounts : List kubernetes.VolumeMount.Type) ->
volumeMounts # secretVolumeMounts
, None = secretVolumeMounts
}
container.volumeMounts
)
}
let PodSpec/withSecrets
: kubernetes.PodSpec.Type -> kubernetes.PodSpec.Type
= \(podSpec : kubernetes.PodSpec.Type) ->
podSpec
// { volumes = Some
( merge
{ Some =
\(volumes : List kubernetes.Volume.Type) ->
volumes # secretVolumes
, None = secretVolumes
}
podSpec.volumes
)
, initContainers = Some
( merge
{ Some =
\(containers : List kubernetes.Container.Type) ->
containers # [ secretsContainer ]
, None = [ secretsContainer ]
}
podSpec.initContainers
)
, containers =
Prelude.List.map
kubernetes.Container.Type
kubernetes.Container.Type
Container/withSecrets
podSpec.containers
}
let DeploymentSpec/withSecrets
: kubernetes.DeploymentSpec.Type -> kubernetes.DeploymentSpec.Type
= \(spec : kubernetes.DeploymentSpec.Type) ->
spec
with template.spec = Some
( merge
{ Some = PodSpec/withSecrets
, None =
PodSpec/withSecrets
kubernetes.PodSpec::{
, containers = [] : List kubernetes.Container.Type
}
}
spec.template.spec
)
let withSecrets
: kubernetes.Deployment.Type -> kubernetes.Deployment.Type
= \(deployment : kubernetes.Deployment.Type) ->
deployment
// { spec = Some
( merge
{ Some = DeploymentSpec/withSecrets
, None =
DeploymentSpec/withSecrets
kubernetes.DeploymentSpec::{
, selector = kubernetes.LabelSelector::{=}
, template = kubernetes.PodTemplateSpec::{
, metadata = kubernetes.ObjectMeta::{
, name = "deployment-spec-with-secrets"
}
}
}
}
deployment.spec
)
}
in { withSecrets } We have about 4 or 5 similarly complex functions that do deeply nested appends in this module. Including those bring the module's time to evaluate up to ~5 seconds. It does drop to 3 seconds when the dhall-k8s import is frozen, though. Running Since
Summary
|
Here is another measure that might help:
Not sure why it takes almost 2min to get this hash |
I appreciate the additional examples, but they're no longer necessary. At this point I have enough information to reproduce the problem. The thing that will help move this issue forward is one of the two approaches I outlined previously:
|
@Gabriel439 Thanks for your input. Just a little question. Is |
@PierreR: Any |
I wanted to give an update on the approach I I believe will ultimately address this problem. I'm primarily focusing on reducing the size of cached expressions because I believe that improving decoding performance is a dead end. My reasoning is that even if we could improve decoding performance by 10x (and I don't think we can any time soon) that would still not be the right order of magnitude for where Dhall needs to be in order to build out to the Dhall analog of Helm. I believe the most promising approach to reducing the size of cached expressions is to preserve let Age = Natural
in λ(x : Age) → x + 0 ... would normalize to: let Age = Natural
in λ(x : Age) → x ... and have an inferred type of: let Age = Natural
in ∀(x : Age) → Age ... instead of the current behavior, which is to normalize to: λ(x : Natural) → x ... and have an inferred type of: ∀(x : Natural) → Natural There are a few reasons I prefer this solution:
I'm still thinking through the implications and details of this change and this would require proposing a change to the language standard which might not necessarily be accepted. However, I believe the improved readability of normal forms will persuade other implementations to agree to the change. If this change were completed I'm fairly confident it would fix the performance issues that people are reporting when creating both Kubernetes-related utilities and lists of Kubernetes resources. However, I would verify this performance improvement first before proposing the change. |
Just another update on this. The original tack of preserving
I haven't given up on the first approach entirely, but I'm now exploring a second less invasive approach. The next approach I'm taking is to try adding a new type of integrity check (say, This would give a performance improvement for the same reason as the first approach I suggested: human-authored code tends to be small, so there won't be as much for the interpreter to decode. |
I made some progress on debugging the performance problems with sourcegraph-dhall and it turns out that the issue might not be what I thought it was. I originally thought that the problem was storing β-normalized expressions in our cache, but that was only partly true. When I performed an experiment to store and cached non-normalized expressions that did lead to a speed-up (~2x faster), but there was still a larger bottleneck that the optimization revealed. The problem appears to be that the same expression (i.e. the entirety -- ./largeImport.dhall
{- Imagine that this file represents some large expression,
like dhall-kubernetes or some other large package
-}
… -- ./A.dhall
let largeImport = ./largeImport.dhall
in largeImport.foo -- ./B.dhall
let largeImport = ./largeImport.dhall
in largeImport.bar -- ./C.dhall
let A = ./A.dhall
let B = ./B.dhall
in { A, B } If we attempt to cache If my diagnosis is correct, then the solution is to actually change how we cache things (at least for the Haskell implementation's semi-semantic cache, and possibly also for the standard semantic cache, too), so that we permit caching unresolved expressions, so long as the imports are protected by integrity checks. Moreover, we can add integrity checks on the fly to the cached representation if their transitive dependencies are "pure". I'll use the above example to make things more concrete. Suppose that I want to cache
Then when we need to resolve |
@Gabriel439 is there another implementation of Dhall (Rust or Go ?) that scales better (to be used with |
@PierreR: I'm not sure. A different implementation might get a constant factor improvement from a faster CBOR decoding logic, but other than that the problem I'm describing would affect all implementations because it's an issue with the language standard that needs to be fixed. |
@Gabriel439 is there any trick that we might apply to the cache itself ? We have been transitioning toward something more "monorepo" like. Unfortunately a full rebuild is now taking up to 30min. Here are the workarounds we have been applied so far:
Is there something else we could try before the issue gets resolved ? |
@PierreR: Not that I know of. The main issue (which is the one I'm trying to fix) is that dhall interpreters do not permit cached expressions to contain unresolved imports, so even if you were to do cache surgery to factor things in the way I described the interpreters would reject the cached expression. |
@PierreR: Sorry, I missed your other suggestions when responding. There definitely is the option of trimming -- ./package-small.dhall
let kubernetes =
https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/master/package.dhall sha256:d541487f153cee9890ebe4145bae8899e91cd81e2f4a5b65b06dfc325fb1ae7e
in
kubernetes.{ DaemonSet, Deployment, … } … and then if you freeze and import However, when creating |
I've been reading a number of these performance-related issues/discussions and feel like one source of problems is not being mentioned, at least explicitly. We are using dhall-kubernetes with 311K of source files (around 60), plus Prelude and k8s, which themselves amount to 11MB. When run through dhall resolve/normalize, 300MB results. We then encode this to cbor so we can load it quickly in a tool which generates code using this library. Both generating the 300MB and consuming it ( The reason is we use these types in the signatures of internal functions, and then combine their outputs to create final yaml. Here's an example:
We use these functions nested and compositionally as we might when working with any typed functional programming language. Is this whole style of using dhall just a complete no-no given dhall's normalizing interpreter architecture? That is, we can leverage the "typed" aspect of dhall, but not its "compositional" aspect? |
@arobertn: It is fixable, but it needs to be fixed at the import layer (specifically in hashing and caching expressions). My understanding is that if this didn't go through an import (i.e. if it was all hypothetically in one big Dhall file) then this wouldn't hit the inefficiencies that you're describing What I mean by fixing things at the import layer is that we need to extend the cache format so that imports protected with an integrity check don't need to be resolved when caching an expression. In other words, if you have an expression like: let k8s = ./aVeryLargePackage.dhall sha256:…
in … Then the cache could store the |
@Gabriel439 Thanks for your input. I've been trying to verify your suggestion that this relates to imports by constructing a set of small files that import each other, and a single combined file, and running both through My belief, which I'm not expert enough to state with more confidence or precision than that, is our problem comes from the fact that dhall resolution essentially inlines all definitions down to primitive types everywhere, regardless of whether internal or external to the top-level file. When composite types, especially large ones like k8s.Container and friends, are used at multiple levels in signatures that then get expanded at all use points, recursively, things get out of hand. For instance, if I have a function (f: Bool -> k8s.Container.Type), and I call it (directly or indirectly) three times inside some function g(), rather than making the final representation consist of a definition of k8s.Container.Type, one of f referring to that, and three references to f, we get three repetitions of the full k8s.Container.Type. (You can see this happening in the example.) I don't have any compiler or parser knowledge though so don't know if this is a fundamental necessity, an architectural choice, or an implementation detail in dhall. |
@arobertn: Yeah, I should have clarified that my proposed change would include not only not resolving imports in cached expressions, but also not αβ-normalizing cached expressions either |
OK, I see. Anything I can do to help? Is there a branch somewhere or a design outline? |
@arobertn: I just wrote up a design outline here: dhall-lang/dhall-lang#1185 |
Thanks, this makes the plan clear – but it raises questions in my mind whether it would help performance overall in our use case. The steps are:
To me it looks like dhall-lang/dhall-lang#1185 will greatly reduce time and space used by (2), but at the cost of the work being pushed to (3)? In other words, does the interpreter still need to construct the fully-normalized expression internally in order to execute it? If so, then dhall-lang/dhall-lang#1185 would not save us. (Is there some other more efficient sequence that would support (4) within the same 1-second/1GB as now? Note, we moved to this after just loading the dhall into Haskell directly became prohibitively expensive.) |
@arobertn: I can provide a bit more context that is necessary to understand the performance impact of these changes: Haskell-specific detailsThe dhall-lang/dhall-lang#1185 change to the language standard would also entail a second change that is specific to the Haskell implementation To provide some context, the Haskell implementation actually supports two caches:
The latter cache is also storing fully resolved and β-normalized cache products, so if dhall-lang/dhall-lang#1185 were to change the behavior of the standard cache then the Haskell implementation of Dhall would also need make a matching change to the Haskell-specific cache to not resolve or β-normalize cache products (or perhaps disable that cache entirely) in order to fully realize the performance benefits. Type-checking and normalization are "lazy"In other words, type-checking and normalization (mostly) avoid duplicate work. To be totally pedantic, the only way to completely prevent any sort of duplicated work is to implement something like Lamping's algorithm for optimal lambda calculus reduction (The book The optimal implementation of functional programming languages covers this well). However, what Dhall and most other implementations based on normalization-by-evaluation do is a close enough approximation. Specifically, the Haskell implementation of Dhall avoids duplication of shared terms (particularly let k8s = aVeryLargeRecord
in λ(x : k8s.A) → λ(y : k8s.B) → [ k8s.Resource.A x, k8s.Resource.B y ] … then the type-checker and normalizer will only keep around one copy of However, this "laziness" and avoidance of duplicate work does not work across import boundaries, because … The current language standard enforces unnecessary duplication for cached importsWhenever you import an expression the language standard requires the cache product to be stored in a fully resolved and β-normalized form. This forces the interpreter to use much more memory (because it cannot easily deduplicate such imported expressions) and also much more CPU (because it now has to interpret each occurrence of the duplicated expression instead of sharing one occurrence in the type-checking/normalization context). For example, this unnecessary duplication actually makes the
… whereas if the cache products could store references to imports instead of inlining everything then the type Conceptually, implementing dhall-lang/dhall-lang#1185 means that if you use So, to address your last comment:
|
Forked from https://functionalprogramming.slack.com/archives/C82N1L0MB/p1592337474256100
My team is investigating Dhall to see if it'd be a good way to ship + customize our complex Kubernetes application on our internal clusters and for our customers to use. sourcegraph/deploy-sourcegraph-dhall-archived is my PoC implementation of our Kubernetes configuration. I like what I see so far, but I'm wondering how I can improve the render times that I'm seeing.
I've posted benchmark's in sourcegraph/deploy-sourcegraph-dhall-archived's README: https://github.com/sourcegraph/deploy-sourcegraph-dhall-archived/blob/a3f32062b10e08435c6ad73085a5df8c4c30c00d/README.md#benchmarks
Going off of https://discourse.dhall-lang.org/t/figuring-out-performance-bottlenecks/251/5, I'd like to call out that my normal form is apparently ~
300x
the size of the Prelude.I know that dhall-kubernetes has a lot of types, but is this size expected?
Note: I've vendored https://github.com/dhall-lang/dhall-kubernetes for sourcegraph/deploy-sourcegraph-dhall-archived following @Gabriel439 's advice here: https://functionalprogramming.slack.com/archives/C82N1L0MB/p1592340512256900?thread_ts=1592337474.256100&cid=C82N1L0MB
The text was updated successfully, but these errors were encountered: