You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERAG will be deployed on virtual machine(s). It was reported by customer that after cluster maintenance they needed to restart VMs and the application did not boot up.
The goal is to :
Setup K8s cluster on VM and install full stack:
prepare images and make full install of ERAG for example using:
Verify all functionality
Reboot VM make sure all services are up and running:
issues found (VLLM/retriever POD were not UP and running even storageCLass has RWX):
│ Normal Scheduled 45m default-scheduler Successfully assigned chatqa/vllm-service-m-deployment-5b97f486d5-c2v5k to erag-1-00-worker-57krl-ltwcw-qz698 │
│ Normal SuccessfulAttachVolume 44m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-10a9b683-247d-4d26-bf93-528a832747d8" │
│ Warning FailedMount 15m (x5 over 43m) kubelet MountVolume.SetUp failed for volume "pvc-10a9b683-247d-4d26-bf93-528a832747d8" : rpc error: code = Internal desc = error publish volume to target path: mount failed: exit status 32 │
│ mounting arguments: -t nfs4 -o hard,sec=sys,vers=4,minorversion=1 vfs001c012.cus.internal:/vsanfs/52ccfa6c-8ff2-cbc6-ad27-63960c62355f /var/lib/kubelet/pods/f13b445e-0767-499e-9b58-b9e58385c094/volumes/kubernetes.io~csi/pvc-10a9b683-247d-4d26-bf93-528a832747d8/mount │
│ output: mount.nfs4: mounting vfs001c012.cus.internal:/vsanfs/52ccfa6c-8ff2-cbc6-ad27-63960c62355f failed, reason given by server: No such file or directory
2 Telemetry PODs did not came up:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────── Logs(monitoring/telemetry-logs-loki-gateway-6767655445-mdl4k:nginx)[tail] ─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Autoscroll:On FullScreen:Off Timestamps:Off Wrap:Off │
│ /docker-entrypoint.sh: No files found in /docker-entrypoint.d/, skipping configuration │
│ 2025/02/14 10:38:17 [emerg] 1#1: host not found in resolver "coredns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33 │
│ nginx: [emerg] host not found in resolver "coredns.kube-system.svc.cluster.local." in /etc/nginx/nginx.conf:33 │
│ stream closed EOF for monitoring/telemetry-logs-loki-gateway-6767655445-mdl4k (nginx) │
│
The text was updated successfully, but these errors were encountered:
ERAG will be deployed on virtual machine(s). It was reported by customer that after cluster maintenance they needed to restart VMs and the application did not boot up.
The goal is to :
Setup K8s cluster on VM and install full stack:
prepare images and make full install of ERAG for example using:
./install_chatqna.sh --enforce-pss --auth --telemetry --deploy xeon_torch_llm_guard --ui
Verify all functionality
Reboot VM make sure all services are up and running:
issues found (VLLM/retriever POD were not UP and running even storageCLass has RWX):
2 Telemetry PODs did not came up:
The text was updated successfully, but these errors were encountered: