Readme to add documentation for the generic EKS cluster for a Jupyterhub deployment.
The purpose of this deployment is to create an AWS manage node cluster using the EKS service. The requirements for this to work before deployment are:
- AWS Account
- AWS CLI setup
- VPC
- Subnet A, B, C (in each availability zones)
- Internet Gateway
- Route associations
- HTTP / HTTPS security group
- NFS security group
- Compute nodes (using Amazon's Managed Nodes)
- Load balancer
- EFS for shared data
- EFS hub-db-dir
Note: To provision both EFS shares the efs-generic.yaml CloudFormation template will want to deployed twice. This is by design, since the hub-db-dir may be a longer lived EFS deployment than shared data.
- VPC - CloudFormation
- EFS - CloudFormation (x2)
- ec2 compute - eksctl
- nginx ingress - kubectl
- helm initialization - helm
- test jupyterhub - helm
VPC - CloudFormation
- Create new CloudFormation stack using vpc.yaml
- Continue through installation, creating customization in the template parameters
- Confirm deployment went through succesfully
EFS - CloudFormation
- Create new CloudFormation stack using the efs-generic.yaml template
- Add customizations per requirements, use vpc and subnet id's from VPC output
- Confirm deployment went through succesfully
ec2 compute - eksctl
-
Verify the AWS CLI is installed and setup
-
Update eks.yaml or eks-spot-nodes.yaml.
Pricing strategy should be decided before creating a cluster in AWS. If you are looking to do a long term, commital agreement with AWS the standard eks.yaml may be the correct solution. This will by default be on-demand pricing, and that you will need to create reservations after the creation of the cluster. To use spot pricing, the eks-spot-nodes.yaml configuration file should be used.
Once this has been decided, the configuration file should be updated with the following unique to your environment fields:
- vpc-id
- subneta-id
- subnetb-id
- subnetc-id
- avilabilityzones
- ssh public key
Note: The cidr for each subnet may need to be adjusted if you customized these in the VPC deployment during the CloudFormation deployment
-
Run the following command, substiting the eks.yaml file (changes to eks.yaml may be required, this is a smaller installation)
eksctl create cluster -f /path/to/files/eks.yaml
Note: You may want to update the eks.yaml file to have a different size instance, or different number of instances. Default of eks.yaml is a single r5.xlarge.
-
Run the following command to verify installation has completed (after the progres is finished)
eksctl get cluster
The newly created cluster should now appear
nginx ingress - kubectl
-
Run the three following commands to create the ingress controller on the cluster, and in AWS:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/mandatory.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/provider/aws/service-l4.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/provider/aws/patch-configmap-l4.yaml
These three commmands will create the ELB, and then setup the required pods and ingress in the infrastructure.
NOTE: Since there is a load balancer in front of the service, the nginx-ingress ingress pod needs to be configured to terminate SSL. Use of kube-lego is deprecated. Configuring this using a helm chart (version 2) will be below. At this point a CNAME record for your deployment should be created that points to the Amazon Elastic Load Balancer name. The format typically is host.domain.tld. This is important for later in the deployment. This assumes you have a domain to use for the deployment.
helm initialization - helm
Note: This is being pulled from the excellent and hard work from the team over at Jupyter. More documentation here: https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-helm.html
-
Run the following command to update the security of the cluster
kubectl create clusterrolebinding cluster-system-anonymous --clusterrole=cluster-admin --user=system:anonymous
-
Create service account for Helm in cluster (assuming that helm is installed on local station)
kubectl --namespace kube-system create serviceaccount tiller
-
Change permission for service account
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
-
Initialize the cluster, and the client:
helm init --service-account tiller --wait
Note: If the cluster is already setup, and you just need to reconnect from a different machine you can initialize only the client using:
helm init --client-only
-
Setup Tiller to only be able to communicate inside of the cluster for better security practices
kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'
-
Verify that helm is communicating with the kubernetes service
helm version
You should receieve a response from the client and server.
test jupyterhub - helm
Note: There should be customizations to the config.yaml and values.yaml files. These will drive the deployment configuration. There are two main portions of the deployment. First will be the customizations to the helm chart. Secondly we will deploy with Helm.
-
Generate a random string for use in securing the communication between proxy and pods.
openssl rand -hex 32
-
Replace the returned results in the config.yaml and values.yaml where "proxy-secret-goes-here" is listed.
-
Update limits on each pod by changing these fields in config and value files: singleuser: cpu: guarantee: 2 limit: 8 memory: guarantee: 1G limit: 2G This will determine the resources each jupyter pod will have, and may impact the overall cluster's use - if there are not enough resources.
-
Determine if you need a specific docker image for use in the environment. If no specific needs are found, then using the jupyter/datascience-notebook or the jupyter/tensorflow-notebook may be used. The advantage of having a custom image is having packages and environments setup and ready to go. A couple maintained by CalPoly:
-
calpolydatascience/datascience-base
-
calpolydatascience/rstudio-base
-
calpolydatascience/tensorflow-r This is updated in this portion:
singleuser: image: name: jupyter/datascience-notebook tag: 45f07a14b422 # using latest version as of 11-12-2019 The default will be jupyter/datascience-notebook:45f07a14b422. This must be changed in both the config and values file.
-
-
Update storage in the config and values file. Inside of the storage configuration for the singleuser you will see extraVolumeMounts and extraVolumes. You will want to update the portion labeled "amazon-efs-shared-storage-server" with the AWS EFS server that was created for shared data. An example of a server is: fs-12345678.efs.location.amazonaws.com This should be updated on both the config and values configuration files.
-
Update authentication configuration in the config and values file.
Note: Pulling again from the Jupyter team. https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html
auth: admin: users: - GitHubAdminAccount github: callbackUrl: https://host.domain.tld/hub/oauth_callback clientId: XXXXXXX clientSecret: XXXXXXX type: github
There should always be at least on admin account, with more often an admin faculty and staff account for whomever deployed service.
-
Update host.domain.tld in both config and values files. Under the ingress controller, there will be listed "host.domain.tld". This should be replaced with the domain name that will be used. A couple examples are:
- class-quarter.domain.tld
- research-group.domain.tld
- workshop.domain.tld
-
Customizations to values.yaml only:
Note: These will only be applied to the values.yaml and will not work until a helm upgrade is ran. To update:
- claimName: nfs-host-pvc
- secretName: host.domain.tld
What these updates do: claimName - migrates jupyterhub database directory to the efs share created using CloudFormation secretName - used to point the hub at the imported SSL certificate
-
Updates to pvc.yaml The pvc.yaml will lastly need to be updated with updated values. You will update it corresponding to this:
amazon-efs-shared-storage-server AWS EFS server for shared storage amazon-efs-hub-storage-server AWS EFS server for the Jupyterhub database directory What this will create is two PV's and two PVC's using shared storage, instead of locally attached storage
Note: This is in response to an issue with Availability Zones in AWS and locally attached storage. For more information a few articles are here: jupyterhub/zero-to-jupyterhub-k8s#870, https://discourse.jupyter.org/t/jupyterhub-hub-db-dir-pv-question/2157
-
Helm deploy Back at the CLI where the cluster can be accessed - run the following commands:
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ helm repo update
This will add the repository to your available helm charts to install, and then run an update to verify to pull into the cache.
-
Jupyterhub Install We can finally now run the first installation command with Helm:
helm install -n name jupyterhub/jupyterhub --namespace namespace --values /path/to/files/config.yaml
This will deploy a Jupyterhub installation, using the default configuration with our updates on top using the config.yaml file. Typically the name of the deployment is the same as the namespace.
-
Kubernetes secret Now a Kubernetes secret will be created to make sure the configuration is SSL encrypyted.
kubectl -n namespace create secret tls host.domain.tld --key=/path/to/files/domain.key --cert=/path/to/files/domain.crt
The host.domain.tld must match the secretName that is listed in the values.yaml configuration file for SSL to work.
-
PVC deploy Now we will create the PV and PVC's using the file we modified earlier.
kubectl -n namespace apply -f /path/to/files/pvc.yaml
-
PVC Shares Now that the PVC has been created we need to update the actual EFS share to have the correct top level folders. This is done easiest by connecting to the kubernetes node using ssh.
-
Connect via SSH to server.
-
Make sure either EFS utils or NFS utils are on the system. This shouldn't be needed since it will be an Amazon Linux server.
sudo yum install -y amazon-efs-utils or sudo yum install -y nfs-utils
-
Create the mount directory:
mkdir /tmp/hub /tmp/shared cd /tmp
-
Mount shares:
sudo mount -t efs amazon-efs-hub-storage-server:/ /tmp/hub sudo mount -t efs amazon-efs-hub-shared-storage:/ /tmp/shared
Note that this we will substitute the command with the fields required. Instead of using the whole server name, you will only need the host entry that is available. An example is that you could have a server with the domain name of fs-12345678.efs.location.amazonaws.com, in this example you will only use the fs-12345678 as the substitute. Example: hub efs storage is fs-12345678.efs.us-west-2.amazonaws.com, shared storage is fs-87654321.efs.us-west-2.amazonaws.com - the commands would look like this: sudo mount -t efs fs-12345678:/ /tmp/hub sudo mount -t efs fs-87654321:/ /tmp/shared
-
Create folders: Now that the folders are mounted, you will need to create subdirectories to be mapped in the containers. These are using the default server pathings in the config and values.yaml
sudo mkdir /tmp/hub/host-hub /tmp/shared/shared sudo chmod -R /tmp/hub && sudo chmod -R /tmp/shared
This should set the permissions on the folders as well to allow files and folders to be created inside of there.
-
-
Jupyterhub Upgrade Now that all of the other prerequisites are finished, we can tie the whole thing up with an upgrade to the application, using the values.yaml file.
helm upgrade --install name jupyterhub/jupyterhub --namespace namespace --values /path/to/files/values.yaml
-
Test application: You should now be able to go to the host.domain.tld that you setup in your DNS registrar. This will forward you to the ELB, then cluster and the solution.
-
Update Helm Chart First the helm chart will need to be updated to have the updated ingress config:
jupyterhub: ingress: enabled: true annotations: kubernetes.io/tls-acme: "true" kubernetes.io/ingress.class: nginx hosts: - YOUR-JUPYTERHUB-HOST-DOMAIN tls: - secretName: YOUR-JUPYTERHUB-HOST-DOMAIN hosts: - YOUR-JUPYTERHUB-HOST-DOMAIN
The "YOUR-JUPYTERHUB-HOST-DOMAIN" will need to be updated in this example on a per case by case basis.
-
Create kubernetes secret The kubernetes secret is the ssl certificate being imported to the cluster. It's important to note, that these should be cleaned up on removal of a deployment - and that they need to be deployed in the namespace of the deployment, not in the kube-system or default namespace.
kubectl -n namespace create secret tls YOUR-JUPYTERHUB-HOST-DOMAIN --key=domain.key --cert=domain.crt
Note that the secret name is being set to the fqdn which should match the secretName in the helm chart. This can be something besides the domain name, but this suffices since there is only one secret to handle in the deployment.
-
Update deployment Once the chart has been updated, the deployment will need to be updated:
helm upgrade --install name jupyterhub/jupyterhub --namespace name --version=version --values=values.yaml
This should be a very quick update and no user pods will be restarted
Once this process is completed it may take up to a minute for the updated ssl certificate to show in the hub.
Note, that this is what created the load balancer in the infrastructure, and applied the instances to the load balancer. What needed to be ran was:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/mandatory.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/provider/aws/service-l4.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.28.0/deploy/static/provider/aws/patch-configmap-l4.yaml
These three commmands will create the ELB, and then setup the required pods and ingress in the infrastructure.
NOTE: Since there is a load balancer in front of the service, the nginx-ingress ingress pod needs to be configured to terminate SSL. Use of kube-lego is deprecated.