Get Started With Fluid

This document mainly describes how to deploy Fluid with Helm, and use Fluid to create a dataset and speed up your application.

Requirements

Kubernetes 1.14+

If you don't have a Kubernetes now, we highly recommend you use a cloud Kubernetes service. Usually, with a few steps, you can get your own Kubernetes Cluster. Here's some of the certified cloud Kubernetes services:
Note: While convenient, Minikube is not recommended to deploy Fluid due to its limited functionalities.
Kubectl 1.14+

Please make sure your kubectl is properly configured to interact with your Kubernetes environment.
Helm 3

In the following steps, we'll deploy Fluid with Helm 3

Deploy Fluid

Create namespace for Fluid
```
$ kubectl create ns fluid-system
```
Download the latest Fluid from Github release page

Deploy Fluid with Helm

$ helm install fluid fluid-<version>.tgz
NAME: fluid
LAST DEPLOYED: Tue Jul  7 11:22:07 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Check running status of Fluid

$ kubectl get po -n fluid-system
NAME                                         READY   STATUS    RESTARTS   AGE
alluxioruntime-controller-64948b68c9-zzsx2   1/1     Running   0          108s
csi-nodeplugin-fluid-2mfcr                   2/2     Running   0          108s
csi-nodeplugin-fluid-l7lv6                   2/2     Running   0          108s
dataset-controller-5465c4bbf9-5ds5p          1/1     Running   0          108s

Create a Dataset

Fluid provides cloud-native data acceleration and management capabilities, and use dataset as a high-level abstraction to facilitate user management. Here we will show you how to create a dataset with Fluid.

Create a Dataset object through the CRD file, which describes the source of the dataset.

$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
EOF

kubectl create -f dataset.yaml

Create an AlluxioRuntime CRD object to support the dataset we created. We use Alluxio as its runtime here.

$ cat<<EOF >runtime.yaml
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: demo
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 2Gi
        high: "0.95"
        low: "0.7"
EOF

Create Alluxio Runtime with kubectl

kubectl create -f runtime.yaml

Next, we create an application to access this dataset. Here we will access the same data multiple times and compare the time consumed by each access.

$ cat<<EOF >app.yaml
apiVersion: v1
kind: Pod
metadata:
  name: demo-app
spec:
  containers:
    - name: demo
      image: nginx
      volumeMounts:
        - mountPath: /data
          name: demo
  volumes:
    - name: demo
      persistentVolumeClaim:
        claimName: demo
EOF

Create Pod with kubectl

$ kubectl create -f app.yaml

Dive into the container to access data, the first access will take longer.

$ kubectl exec -it demo-app -- bash
$ du -sh /data/spark/spark-3.0.1-bin-without-hadoop.tgz
150M	/data/spark/spark-3.0.1-bin-without-hadoop.tgz
$ time cp /data/spark/spark-3.0.1-bin-without-hadoop.tgz /dev/null
real	0m13.171s
user	0m0.002s
sys	0m0.028s

In order to avoid the influence of other factors like page cache, we will delete the previous container, create the same application, and try to access the same file. Since the file has been cached by alluxio at this time, you can see that it takes significantly less time now.
```
$ kubectl delete -f app.yaml && kubectl create -f app.yaml
$ kubectl exec -it demo-app -- bash
$ time cp /data/spark/spark-3.0.1-bin-without-hadoop.tgz /dev/null
real	0m0.344s
user	0m0.002s
sys	0m0.020s
```

We've created a dataset and did some management in a very simple way. For more detail about Fluid, we provide several sample docs for you:

Speed Up Accessing Remote Files
Cache Co-locality for Workload Scheduling
Accelerate Machine Learning Training with Fluid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_started.md

get_started.md

Get Started With Fluid

Requirements

Deploy Fluid

Create a Dataset

Files

get_started.md

Latest commit

History

get_started.md

File metadata and controls

Get Started With Fluid

Requirements

Deploy Fluid

Create a Dataset