This document discusses how to run a Docker Desktop deployment on a single Laptop or Desktop.
These instructions are intended for Mac or Windows experimenters. For Linux please see the (Linux Kubernetes local example)[examples/local/README.md]
These instructions are generally intended for CPU users, however they can also apply to multiple GPUs within a single host if the nvidia for docker tooling is installed.
The motivation behind this style of deployment of the runner is for cases where python based applications or frameworks and libraries they use are not capable of scaling beyond a single thread of execution, or are not thread-safe.
- Docker Desktop multi runner deployment
- Table of Contents
- Introduction
- Pre-requisites
- Configuration and Deployment
- Using the Cluster
Using this document you will be able to run multiple studioml go runners on a single docker host.
Before using the following instructions experimenters will need to have Docker Desktop 2.3+ service installed.
This option requires at least 8Gb of memory in the minimal setups.
Any tools and servers used within the deployment are version controlled by the dockerhub container registry and so do not need to be specified.
Once Docker Desktop is installed use the Windows Start->Docker menu, or Mac OSX menubar for Docker Desktop to perform the following actions :
-
Use the Preferences Resources tab to increase the amount of RAM allocated to Docker to at least 8Gb.
-
Activate the Kubernetes feature using the Preferences option in the menu. In addition the menu should show a green light and the "Kubernetes is running" indication inside the menu Kubernetes has initialized and is ready for use. For more details please see, https://docs.docker.com/desktop/.
-
Use the Kubernetes menu item to check that the Kubernetes instance installed and defaults to is the 'docker-desktop' instance.
-
Export the kubectl configuration for your local cluster, see instructions in the validation section.
kubectl can be installed using instructions found at:
Minio offers a client for the file server inside the docker cluster called, mc.
The quickstart guide details installation for Windows, and Mac. For Mac Homebrew is used as shown:
brew install minio/stable/mc
docker context export default --kubeconfig ~/.kube/docker.kubeconfig
To validate your installation you can now leave the KUBE_CONFIG, and KUBECONFIG environment variables set, or set then to point at your exported configuration file '~/.kube/docker.kubeconfig', this will allow the kubectl tool to default to using your localhost to communicate with the cluster.
Now the kubectl command access can be tested as shown in the following Mac example:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-desktop Ready master 2m12s v1.16.6-beta.0
$ kubectl describe nodes
Name: docker-desktop
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=docker-desktop
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 04 May 2020 15:17:10 -0700
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: docker-desktop
AcquireTime: <unset>
RenewTime: Mon, 04 May 2020 16:17:12 -0700
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 04 May 2020 16:16:23 -0700 Mon, 04 May 2020 15:17:08 -0700 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.65.3
Hostname: docker-desktop
Capacity:
cpu: 6
ephemeral-storage: 61255492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2038544Ki
pods: 110
Allocatable:
cpu: 6
ephemeral-storage: 56453061334
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1936144Ki
pods: 110
System Info:
Machine ID: cff33312-1793-4201-829d-010a1525d327
System UUID: fb714256-0000-0000-a61c-ee3a89604c3a
Boot ID: 1d42a706-7f4f-4c91-8ec9-fd53bf1351bc
Kernel Version: 4.19.76-linuxkit
OS Image: Docker Desktop
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.8
Kubelet Version: v1.16.6-beta.0
Kube-Proxy Version: v1.16.6-beta.0
Non-terminated Pods: (11 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
docker compose-78f95d4f8c-6lp49 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m
docker compose-api-6ffb89dc58-qgnpq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system coredns-5644d7b6d9-2xr4r 100m (1%) 0 (0%) 70Mi (3%) 170Mi (8%) 59m
kube-system coredns-5644d7b6d9-vvpzk 100m (1%) 0 (0%) 70Mi (3%) 170Mi (8%) 59m
kube-system etcd-docker-desktop 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system kube-apiserver-docker-desktop 250m (4%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system kube-controller-manager-docker-desktop 200m (3%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system kube-proxy-tdsn2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 59m
kube-system kube-scheduler-docker-desktop 100m (1%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system storage-provisioner 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m
kube-system vpnkit-controller 0 (0%) 0 (0%) 0 (0%) 0 (0%) 58m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (12%) 0 (0%)
memory 140Mi (7%) 340Mi (17%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 60m kubelet, docker-desktop Starting kubelet.
Normal NodeHasSufficientMemory 60m (x8 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 60m (x8 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 60m (x7 over 60m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 60m kubelet, docker-desktop Updated Node Allocatable limit across pods
Normal Starting 59m kube-proxy, docker-desktop Starting kube-proxy.
Minio is used to create a storage server for runner clusters when AWS is not being used. This step will create a storage service with 10Gb of space. It uses the persistent volume claim feature to retain any data the server has been sent and to prevent restarts from loosing the data. The following steps are a summary of what is needed to standup the server:
kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-pvc.yaml
kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-deployment.yaml
kubectl create -f https://raw.githubusercontent.com/minio/minio/RELEASE.2020-05-16T01-33-21Z/docs/orchestration/kubernetes/minio-standalone-service.yaml
More detailed information is available from Minio Standalone Deployment.
To create the cluster a Kubernetes deployment yaml file is used and can be applied using applied using the 'kubectl -f [filename]' command. The deployment file can be obtained from this github project at examples/docker/deployment.yaml.
Before applying this file examine its contents and locate the studioml-go-runner-deployment deployment section, and then the resources subsection . The resources subsection contains the hardware resources that will be assigned to the studioml runner pod. Edit the resources to fit with your local machines capabilities and the resources needed to run your workloads. The default 'replicas' value in the studioml-go-runner-deployment deployment section is set to 1 to reflect having a single runner.
The runner will divide the up the resources it has been allocated to service jobs arriving from your local 'studio run', or completion service. As jobs are received by the runner the work will be apportioned by the runner and once the runner has allocated the resources that it has available it will stop secheduling more workers until sufficent resources are released. On a single node there is no need to run more than one runner, expect in testing situations and the like where there might be a functional requirement.
You should also examine the cpu and memory sizings to ensure that the runner deployment pod fits and can be run by the cluster, if not they will remain in a 'Pending' state. This can be done using the 'kubectl describe node' command and examining the hardware assigned to run the cluster.
Once you have checked the deployment file it can be applied as follows:
export KUBE_CONFIG=~/.kube/docker.kubeconfig
export KUBECONFIG=~/.kube/docker.kubeconfig
or
unset KUBE_CONFIG
unset KUBECONFIG
then
kubectl apply -f deployment.yaml
Having created the services you can validate access to your freshly deployed services as shown in the following example:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20h
minio-service LoadBalancer 10.104.248.60 localhost 9000:30767/TCP 10m
rabbitmq-service LoadBalancer 10.104.168.157 localhost 15672:30790/TCP,5672:31312/TCP 2m22s
You will notice that the ports have been exposed to the localhost interface of your Mac or Windows machine. This allows you to for example use your browser to access minio on 'http://localhost:9000', using a username of 'minio' and password of 'minio123'. The rabbitMQ administration interface is on 'http://localhost:9000', username 'guest', and password 'guest'.
Clearly an insecure deployment intended just for testing, and benchmarking purposes. If you wish to deploy these services with your own usernames and passwords examine the YAML files used for deployments and modify them with appropriate values for your situation.
For more information on exposing ports from Kubernetes please see, accessing an application in Kubernetes
There are two basic ways to get a sense of dynamic CPU and memory consumption.
-
The first is to use 'docker stats'. This is the simplest and probably best approach.
-
The second is to use the Kubernetes Web UI dashboard, more details below.
If you wish to use dashboard style monitoring of your local clusters resource consumption you can use the Kubernetes Dashboard which has an introduction at Web UI (Dashboard), and detailed access and installation instructions at, https://github.com/kubernetes/dashboard.
Having deployed the cluster we can now launch studio experiments using the localhost for our queue and for our storage. To do this your studioml config.yaml file should be updated something like the following:
database:
type: s3
endpoint: http://minio-service.default.svc.cluster.local:9000
bucket: metadata
authentication: none
storage:
type: s3
endpoint: http://minio-service.default.svc.cluster.local:9000
bucket: storage
cloud:
queue:
rmq: "amqp://guest:guest@rabbitmq-service.default.svc.cluster.local:5672/%2f?connection_attempts=30&retry_delay=.5&socket_timeout=5"
server:
authentication: None
resources_needed:
cpus: 1
hdd: 10gb
ram: 2gb
env:
AWS_ACCESS_KEY_ID: minio
AWS_SECRET_ACCESS_KEY: minio123
AWS_DEFAULT_REGION: us-west-2
verbose: debug
In order to access the minio and rabbitMQ servers the host names being used will need to match between the experiment host where experiments are launched and host names inside the compute cluster. To do this the /etc/hosts, typically using 'sudo vim /etc/hosts', file of your local experiment host will need the following line added.
127.0.0.1 minio-service.default.svc.cluster.local rabbitmq-service.default.svc.cluster.local
If you wish you can use one of the examples provided by the StudioML python client to test your configuration, github.com/studioml/studio/examples/keras. Doing this will look like the following example:
cd studio/examples/keras
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123
studio run --lifetime=30m --max-duration=20m --gpus 0 --queue=rmq_kmutch --force-git train_mnist_keras.py
There are many ways that can be used to retrieve experiment results from the minio server.
The Minio Client (mc) mentioned as a prerequiste can be used to extract data from folders on the minio recursively as shown in the following example:
mc config host add docker-desktop http://minio-service.default.svc.cluster.local:9000 minio minio123
mc cp --recursive docker-desktop/storage/experiments experiment-results
It should be noted that the bucket names in the above example originate from the ~/.studioml/config.yaml file.
Additional information related to the minio client can be found at MinIO Client Complete Guide.
Copyright © 2020 Cognizant Digital Business, Evolutionary AI. All rights reserved. Issued under the Apache 2.0 license.