tl; dr; A combinator data component that installs Pachyderm, a data lineage and pipelining solution.
Pachyderm is an open-source-driven solution that provides data lineage and pipelines. Data lineage is important for _provenance_; knowing the origin of downstream assets. In ML, the assets are often models and the provenance describes how the model became to be. Precise knowledge of what a model was trained upon is important for disaster recovery, auditing, and robustness.
Pipelines encode a process. This can be anything from automating pre-processing, to training and deploying models. Pachyderm's solution is unique beacuse it is backed by data lineage; i.e. data driven pipelines, not process driven ones.
The fastest way to get started is to use the test drive functionality provided by TestFaster. Click on the "Launch Test Drive" button below (opens a new window).
Once the test drive has launched, click the two links to the left to get started with Pachyderm:
- Click the Jupyter link and launch the
demo.ipynb
notebook. - Click on the Dashboard link to launch the Pachyderm Enterprise Dashboard.
Start by preparing your Kubernetes cluster using one of the infrastructure components or use your own cluster.
module "pachyderm" {
source = "combinator-ml/pachyderm/k8s"
# Optional settings go here
}
See the full configuration options below.
Name | Version |
---|---|
helm | ~> 2.1.2 |
kubernetes | ~> 2.2.0 |
null | ~> 3.1.0 |
Name | Version |
---|---|
helm | ~> 2.1.2 |
kubernetes | ~> 2.2.0 |
No Modules.
Name |
---|
helm_release |
kubernetes_namespace |
Name | Description | Type | Default | Required |
---|---|---|---|---|
namespace | (Optional) The namespace to install the release into. | string |
"pachyderm" |
no |
values | (Optional) List of values in raw yaml to pass to helm. See https://github.com/pachyderm/helmchart/blob/master/pachyderm/values.yaml. | list(string) |
[ |
no |
Name | Description |
---|---|
namespace | Namespace is the kubernetes namespace of the release. |