Skip to content

Commit

Permalink
ray cluste and test file
Browse files Browse the repository at this point in the history
  • Loading branch information
crinagurev committed Nov 1, 2024
1 parent d2365df commit 6be3014
Show file tree
Hide file tree
Showing 4 changed files with 168 additions and 0 deletions.
31 changes: 31 additions & 0 deletions docker/runner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

In a ray or distributed computing cluster, the terms "head node" and "worker nodes" refer to different roles that containers play in the cluster. The head node is the master node in a Ray cluster. You typically have one head node. Worker nodes are the containers that execute the jobs, in parallel. You can have as many worker nodes as you want.

-----------------------------------------------------------
To install KubeRay:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --namespace ray-system --create-namespace

https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html
The above source was used for creating the ray-cluster.yaml

Installing kubernetes and minikubernetes, you can follow this guide to check out how to install and run them: https://medium.com/@areesmoon/installing-minikube-on-ubuntu-20-04-lts-focal-fossa-b10fad9d0511

Use 'pip install ray' and then 'pip show ray' to get your version of ray.

----------------------------------------------------------------------------------
After you have both kubernetes and ray, use the following command to create a cluster: kubectl apply -f ray-cluster.yaml

'kubectl get pods'-> this is will give you the cluster name

Use to forward the needed port to the ray service: kubectl port-forward svc/<cluster name> 8265:8265

This is the port that will be used inside ray_jobs.py, where we submit the jobs to ray.






83 changes: 83 additions & 0 deletions docker/runner/ray-cluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
name: roboteam-ray-cluster
spec:
rayVersion: "2.38.0"
# Head node configuration
headGroupSpec:
rayStartParams:
dashboard-host: "0.0.0.0"
template:
metadata:
labels:
app: ray-head
spec:
containers:
- name: ray-head
image: rayproject/ray:2.38.0
ports:
- containerPort: 8265 # dashboard port
- containerPort: 6379 # redis port
- containerPort: 10001 # GCS server port
- containerPort: 8000 # Serve port
resources:
requests:
cpu: 300m
memory: 256Mi
limits:
cpu: 600
memory: 512Mi
command: ["/bin/bash", "-c", "--"]
args: ["ray start --head --port=6379 --dashboard-host=0.0.0.0 --block"]

livenessProbe:
exec:
command:
- bash
- -c
- "wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success"
initialDelaySeconds: 90
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 5

readinessProbe:
exec:
command:
- bash
- -c
- "wget -T 10 -q -O- http://localhost:8265/api/gcs_healthz | grep success"
initialDelaySeconds: 90
timeoutSeconds: 10
periodSeconds: 30
failureThreshold: 5


# Worker node configuration
workerGroupSpecs:
- groupName: worker-group
replicas: 1 # Number of worker nodes
rayStartParams:
num-cpus: "1"
template:
metadata:
labels:
app: ray-worker
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.38.0
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
env:
- name: RAY_HEAD_IP
value: "roboteam-ray-cluster-head-svc.default.svc.cluster.local"
command: ["/bin/bash", "-c", "--"]
args: ["ray start --address='roboteam-ray-cluster-head-svc.default.svc.cluster.local:6379' --block"]

32 changes: 32 additions & 0 deletions roboteam_ai/src/RL/example_ray_job.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import ray

#Case 1: the function will be distributed across the ray cluster
@ray.remote
def compute_square(x):
return x * x

#Case 2: the class will be distributes across th ray cluster
@ray.remote
class Counter:
def __init__(self):
self.count = 0

def increment(self):
self.count += 1
return self.count

#Testing case 1:
future = compute_square.remote(5)
result = ray.get(future)
print(f"Result: {result}")

#submiting multiple tasks in parallel
futures = [compute_square.remote(i) for i in range(10)]
results = ray.get(futures)
print(f"Results: {results}")


#Testing case 2, using the class:
counter = Counter.remote()
future_count = counter.increment.remote()
print(f"Count: {ray.get(future_count)}")
22 changes: 22 additions & 0 deletions roboteam_ai/src/RL/ray_jobs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import ray
import requests

#connect to the head node, to submit jobs
#ray.init(address=os.getenv('RAY_ADDRESS'))

#@ray.remote
#def run_simulation(simulation_name, port):
# print(f"Running simulation {simulation_name} on port {port}")
# return f"Simulation {simulation_name} completed on port {port}"

#result = ray.get(compute_square.remote(4))
#print(result)



job_payload = {
"entrypoint": "python3 example_ray_job.py"
}

response = requests.post("http://localhost:8265/api/jobs/", json=job_payload)
print(response.json())

0 comments on commit 6be3014

Please sign in to comment.