Kubeflow pipelines built on top of Tensorflow TFX library
This repository contains machine learning pipelines based on Tensorflow TFX library. Every pipeline is designed to be published on a Kubernetes/Kubeflow cluster on premise.
Each folder contains needed code and data for the Kubeflow Pipeline, plus a README that includes:
- pipeline general information
- specific data handling about pipeline on premise
- interactive notebooks instructions
- build and launch procedure
Further pipelines are welcome via pull request.
- iris - Complete pipeline for a simple (Keras) model on IRIS dataset.
- cifar-10 - Complete pipeline for a CNN model on CIFAR-10 dataset [NEEDS UPDATE].
- inat-2019 - Complete pipeline for a MobilenetV2 model on iNaturalist 2019 dataset [NEEDS UPDATE].
Pipelines are actually using custom TFX images containing NVIDIA drivers for GPU usage from tfx-nvidia-gpu
Here some prerequisites needed to deploy this repo.
- Kubeflow version >=1.0
- Tensorflow >=2.1.0
- Tensorflow TFX ==0.21.1
A PersistentVolumeClaim called tfx-pvc
is needed so the cluster should have one ready before dropping the pipelines.
Here an example of a 100Gb claim with a local-path storageClass onboard.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: tfx-pvc
namespace: kubeflow
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 100Gi
Cloning this repository into the root of the tfx
PersistentVolume is needed before starting any pipeline.
Some python libraries are needed. Install them with:
pip install -r requirements.txt
requirements.txt file is on root of this repo.