Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[求助/Help]Ceph配置对象存储后新建虚拟机磁盘创建报错 #22307

Open
chenjacken opened this issue Mar 18, 2025 · 11 comments
Open
Labels
question Further information is requested state/awaiting processing

Comments

@chenjacken
Copy link

chenjacken commented Mar 18, 2025

集群:v3.11.9
宿主机操作系统:Centos 7.9

平台配置Ceph作为块存储,配置检测是正确的

Image

创建虚拟机是,磁盘会创建失败。web日志是:

sync_status=>unknown: {"__reason__":"{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Get \\\"https://10.10.0.24:8885/disks/17db4fed-96e1-471b-85f5-e1641de77d13/465fcd93-94db-4bcf-8889-0381bc942a73/status\\\": dial tcp 10.10.0.24:8885: connect: no route to host\",\"request\":{\"headers\":{\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Request-Id\":\"129aa6-ba13c0\",\"X-Task-Id\":\"daef7cf2-8003-44db-872a-27d171e2872d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/daef7cf2-8003-44db-872a-27d171e2872d\",\"X-Yunion-Parent-Id\":\"0.0\",\"X-Yunion-Peer-Service-Name\":\"compute_v2\",\"X-Yunion-Remote-Addr\":\"10.10.0.24:8885\",\"X-Yunion-Span-Id\":\"0.0.0\",\"X-Yunion-Span-Name\":\"\",\"X-Yunion-Strace-Debug\":\"true\",\"X-Yunion-Strace-Id\":\"ccf3f365\"},\"method\":\"GET\",\"url\":\"https://10.10.0.24:8885/disks/17db4fed-96e1-471b-85f5-e1641de77d13/465fcd93-94db-4bcf-8889-0381bc942a73/status\"}}}","__stage__":"OnDiskSyncStatusComplete","__status__":"ERROR"}

我看日志是有一个连接https://10.10.0.24:8885/disks/17db4fed-96e1-471b-85f5-e1641de77d13/465fcd93-94db-4bcf-8889-0381bc942a73/status,请问下0.10.0.24:8885是宿主机的IP吗?该IP对应的是一台计算节点,非master宿主机IP。

继续创建几台虚拟机(绑定不同的宿主机)测试,一样是报这个错,为什么会是统一0.10.0.24:8885返回来的错呢?如果想看更详细的日志,请问是看那个pod的日志?

用本地存储创建虚拟机是正常的。
谢谢🌹

@chenjacken chenjacken added the question Further information is requested label Mar 18, 2025
@wanyaoqi
Copy link
Member

共享存储会选择一台宿主机统一做磁盘镜像缓存,你可以查看报错ip节点的host日志

@chenjacken
Copy link
Author

查看到有如下的信息,有EEOR的内容

[root@master1 ~]# kubectl logs default-host-gs9vd -n onecloud -c host --tail 100 -f 
[info 2025-03-10 12:42:10 isolated_device.getPassthroughGPUs(gpu.go:86)] filter address [], enableWhiteList: false
[warning 2025-03-10 12:42:17 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.0 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #1 [a1ba]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2025-03-10 12:42:17 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address  is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2025-03-10 12:42:17 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.1 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #2 [a1bb]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2025-03-10 12:42:17 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address  is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2025-03-10 12:42:17 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:16.4 \"Communication controller [0780]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family MEI Controller #3 [a1be]\" -r09 \"ASUSTeK Computer Inc. [1043]\" \"Device [871e]\"", device: {}, error: device address is empty: {}
[info 2025-03-10 12:42:18 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address  is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2025-03-10 12:42:18 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:1c.0 \"PCI bridge [0604]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family PCI Express Root Port #1 [a190]\" -rf9 \"\" \"\"", device: {}, error: device address is empty: {}
[info 2025-03-10 12:42:18 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address  is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[warning 2025-03-10 12:42:18 isolated_device.NewPCIDevice2(gpu.go:241)] fillPCIEInfo for line: "00:1c.3 \"PCI bridge [0604]\" \"Intel Corporation [8086]\" \"C620 Series Chipset Family PCI Express Root Port #4 [a193]\" -rf9 \"\" \"\"", device: {}, error: device address is empty: {}
[info 2025-03-10 12:42:19 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address  is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2025-03-10 12:42:20 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address 02:00.0 is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2025-03-10 12:42:21 isolated_device.(*PCIDevice).IsBootVGA(gpu.go:397)] PCI address 03:00.0 is boot_vga: /sys/devices/pci0000:00/0000:00:1c.3/0000:02:00.0/0000:03:00.0/boot_vga
[info 2025-03-10 12:43:08 ovnutils.configBridgeMtu.func1(ovnutils.go:42)] set brvpc MTU to 1500 success!
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initIsolatedDevices(hostinfo.go:2076)] probeSyncIsolatedDevices []
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage Ceph-HDD(rbd) mountpoint rbd:hddpool
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage Ceph-SSD(rbd) mountpoint rbd:
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage host_172.16.0.24_local_storage_4(local) mountpoint /opt/cloud/workspace/disks4
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage host_172.16.0.24_local_storage_3(local) mountpoint /opt/cloud/workspace/disks3
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage host_172.16.0.24_local_storage_2(local) mountpoint /opt/cloud/workspace/disks2
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage host_172.16.0.24_local_storage_1(local) mountpoint /opt/cloud/workspace/disks1
[info 2025-03-10 12:43:59 hostinfo.(*SHostInfo).initStoragesInternal(hostinfo.go:1943)] Storage host_172.16.0.24_local_storage_0(local) mountpoint /opt/cloud/workspace/disks
[warning 2025-03-10 12:43:59 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:296)] get hardware info: storage: host_172.16.0.24_local_storage_0, [read model file: /sys/class/block/device/model: open /sys/class/block/device/model: no such file or directory, read vendor file: /sys/class/block/device/vendor: open /sys/class/block/device/vendor: no such file or directory]
[info 2025-03-10 12:43:59 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:306)] Sync storage info b589d472-1e6c-46e8-8df6-deaf5908fa28/host_172.16.0.24_local_storage_0
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSyncStorageInfoSucc(hostinfo.go:2030)] storage id b589d472-1e6c-46e8-8df6-deaf5908fa28
[info 2025-03-10 12:44:00 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:306)] Sync storage info 6e50a08c-66f8-49a1-89af-fd67f90982ad/host_172.16.0.24_local_storage_1
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSyncStorageInfoSucc(hostinfo.go:2030)] storage id 6e50a08c-66f8-49a1-89af-fd67f90982ad
[info 2025-03-10 12:44:00 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:306)] Sync storage info 326bd719-9bc3-4873-82df-eec9190e6db9/host_172.16.0.24_local_storage_2
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSyncStorageInfoSucc(hostinfo.go:2030)] storage id 326bd719-9bc3-4873-82df-eec9190e6db9
[info 2025-03-10 12:44:00 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:306)] Sync storage info a7265acb-72ac-4018-8366-f5ec45828e06/host_172.16.0.24_local_storage_3
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSyncStorageInfoSucc(hostinfo.go:2030)] storage id a7265acb-72ac-4018-8366-f5ec45828e06
[info 2025-03-10 12:44:00 storageman.(*SLocalStorage).SyncStorageInfo(storage_local.go:306)] Sync storage info 1d999d68-8ff9-44ed-84ab-5769aec524ea/host_172.16.0.24_local_storage_4
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSyncStorageInfoSucc(hostinfo.go:2030)] storage id 1d999d68-8ff9-44ed-84ab-5769aec524ea
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).onSucc(hostinfo.go:2208)] Host registration process success....
[info 2025-03-10 12:44:00 guestman.NewGuestCpuSetCounter(guesthelper.go:270)] cpusetcounter {"nodes":[{"cpu_count":40,"cpu_dies":[{"cpu_free":{"%!s(int=0)":2,"%!s(int=1)":2,"%!s(int=10)":2,"%!s(int=11)":2,"%!s(int=12)":2,"%!s(int=13)":2,"%!s(int=14)":2,"%!s(int=15)":2,"%!s(int=16)":2,"%!s(int=17)":2,"%!s(int=18)":2,"%!s(int=19)":2,"%!s(int=2)":2,"%!s(int=3)":2,"%!s(int=4)":2,"%!s(int=40)":2,"%!s(int=41)":2,"%!s(int=42)":2,"%!s(int=43)":2,"%!s(int=44)":2,"%!s(int=45)":2,"%!s(int=46)":2,"%!s(int=47)":2,"%!s(int=48)":2,"%!s(int=49)":2,"%!s(int=5)":2,"%!s(int=50)":2,"%!s(int=51)":2,"%!s(int=52)":2,"%!s(int=53)":2,"%!s(int=54)":2,"%!s(int=55)":2,"%!s(int=56)":2,"%!s(int=57)":2,"%!s(int=58)":2,"%!s(int=59)":2,"%!s(int=6)":2,"%!s(int=7)":2,"%!s(int=8)":2,"%!s(int=9)":2},"vcpu_count":0}],"node_id":0,"numa_huge_free_mem_size_kb":0,"numa_huge_mem_size_kb":0,"vcpu_count":0},{"cpu_count":40,"cpu_dies":[{"cpu_free":{"%!s(int=20)":2,"%!s(int=21)":2,"%!s(int=22)":2,"%!s(int=23)":2,"%!s(int=24)":2,"%!s(int=25)":2,"%!s(int=26)":2,"%!s(int=27)":2,"%!s(int=28)":2,"%!s(int=29)":2,"%!s(int=30)":2,"%!s(int=31)":2,"%!s(int=32)":2,"%!s(int=33)":2,"%!s(int=34)":2,"%!s(int=35)":2,"%!s(int=36)":2,"%!s(int=37)":2,"%!s(int=38)":2,"%!s(int=39)":2,"%!s(int=60)":2,"%!s(int=61)":2,"%!s(int=62)":2,"%!s(int=63)":2,"%!s(int=64)":2,"%!s(int=65)":2,"%!s(int=66)":2,"%!s(int=67)":2,"%!s(int=68)":2,"%!s(int=69)":2,"%!s(int=70)":2,"%!s(int=71)":2,"%!s(int=72)":2,"%!s(int=73)":2,"%!s(int=74)":2,"%!s(int=75)":2,"%!s(int=76)":2,"%!s(int=77)":2,"%!s(int=78)":2,"%!s(int=79)":2},"vcpu_count":0}],"node_id":1,"numa_huge_free_mem_size_kb":0,"numa_huge_mem_size_kb":0,"vcpu_count":0}],"numa_enabled":false}
[info 2025-03-10 12:44:00 guestman.(*SGuestManager).LoadExistingGuests(guestman.go:426)] Find existing guest 1202ef25-0a74-4fd2-8703-118b26b99d40
[info 2025-03-10 12:44:00 guestman.(*SGuestManager).LoadExistingGuests(guestman.go:426)] Find existing guest 43395b9d-2448-4d3b-823a-ccd4d1f6aa1b
[info 2025-03-10 12:44:00 guestman.(*SGuestManager).LoadExistingGuests(guestman.go:426)] Find existing guest 4947c82a-1943-4456-8e21-23e53db6e079
[info 2025-03-10 12:44:00 hostdhcp.(*SGuestDHCPServer).Start(dhcpserver.go:73)] SGuestDHCPServer starting ...
[info 2025-03-10 12:44:00 hostdhcp.(*SGuestDHCPServer).Start(dhcpserver.go:73)] SGuestDHCPServer starting ...
[info 2025-03-10 12:44:00 guestman.(*SGuestManager).Bootstrap(guestman.go:262)] Loading existing guests ...
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).ImportServer(qemu-kvm.go:973)] ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b) is running, pending_delete=false
[info 2025-03-10 12:44:00 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56101 success
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}, "capabilities": ["oob"]}}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).onMonitorConnected(qemu-kvm.go:1312)] Monitor connected ...
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).ImportServer(qemu-kvm.go:973)] Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40) is running, pending_delete=false
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"qmp_capabilities"}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"query-version"}
[info 2025-03-10 12:44:00 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56102 success
[info 2025-03-10 12:44:00 guestman.(*SGuestManager).OnLoadExistingGuestsComplete(guestman.go:333)] Load existing guests complete...
[error 2025-03-10 12:44:00 hostinfo.(*SHostInfo).PutHostOnline(hostinfo.go:1659)] Host sys error: map[storages:[{storages 41d5f810-3b02-4fc0-8e34-f280af70af1f Ceph-SSD check storage accessible failed: output: stderr "unable to parse addrs in '[]'\n2025-03-10T12:43:59.916+0000 7ff1bd972700 -1 monclient: get_monmap_and_config cannot identify monitors to contact\n[errno 22] RADOS invalid argument (error connecting to the cluster)\n": exit status 1 2025-03-10 12:43:59.920566388 +0000 UTC m=+115.397311978}]]
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}, "capabilities": ["oob"]}}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).onMonitorConnected(qemu-kvm.go:1312)] Monitor connected ...
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": {}}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"qmp_capabilities"}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"query-version"}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).onGetQemuVersion(qemu-kvm.go:1382)] Guest(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b) qemu version 4.2.0
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": {}}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).onGetQemuVersion(qemu-kvm.go:1382)] Guest(1202ef25-0a74-4fd2-8703-118b26b99d40) qemu version 4.2.0
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": "VM status: running\r\n"}
[info 2025-03-10 12:44:00 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1489)]43395b9d-2448-4d3b-823a-ccd4d1f6aa1b: onConfirmRunning status running
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).detachStartupTask(qemu-kvm.go:1830)]43395b9d-2448-4d3b-823a-ccd4d1f6aa1b: detachStartupTask
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"query-block-jobs"}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": "VM status: running\r\n"}
[info 2025-03-10 12:44:00 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1489)]1202ef25-0a74-4fd2-8703-118b26b99d40: onConfirmRunning status running
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": []}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).detachStartupTask(qemu-kvm.go:1830)]1202ef25-0a74-4fd2-8703-118b26b99d40: detachStartupTask
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"query-block-jobs"}
[info 2025-03-10 12:44:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": []}
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).allocGuestNumaCpuset(qemu-kvm.go:2922)] alloc numa cpus map[0:{[4 58 44 56 0 1 2 6 8 15 9 10 12 55 52 7 13 19 41 42 43 16 46 51 54 49 50 3 14 17 18 47 48 57 45 53 59 5 11 40] 0 false} 1:{[39 61 62 25 31 37 34 70 75 20 22 24 73 78 23 63 65 60 71 72 74 76 21 26 32 69 77 33 38 66 35 36 64 67 79 28 29 30 27 68] 0 false}]
[error 2025-03-10 12:44:00 guestman.(*SGuestManager).OnVerifyExistingGuestsSucc(guestman.go:306)] verify_existing_guests return unknown server 5ac22023-77b5-48f8-815e-f9ce23a44a95 ???????
[info 2025-03-10 12:44:00 hostpinger.(*SHostPingTask).Start(hostpinger.go:81)] Start host pinger ...
[info 2025-03-10 12:44:00 app.ServeForeverExtended(app.go:60)] Start listen on https://0.0.0.0:8885, isMaster: true
[info 2025-03-10 12:44:00 hostinfo.(*SHostInfo).OnCatalogChanged(hostinfo.go:2411)] telegraf configuration change, to reload ...
[info 2025-03-10 12:44:00 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:847)] Use vnc port 3
[warning 2025-03-10 12:44:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:1120)] Guest 4947c82a-1943-4456-8e21-23e53db6e079 start monitor failed, can't get qmp monitor port or monitor path
[warning 2025-03-10 12:44:01 guestman.(*SKVMGuestInstance).scriptStart(qemu-kvm.go:2371)] Guest middleware(4947c82a-1943-4456-8e21-23e53db6e079) waiting monitor connect
[info 2025-03-10 12:44:01 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56103 success
[info 2025-03-10 12:44:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:887)] VM started middleware(4947c82a-1943-4456-8e21-23e53db6e079) ...
[info 2025-03-10 12:44:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:893)] Async start server middleware(4947c82a-1943-4456-8e21-23e53db6e079) success!
[info 2025-03-10 12:44:02 monitor.(*QmpMonitor).read(qmp.go:232)] Scan over middleware(4947c82a-1943-4456-8e21-23e53db6e079) ...
[info 2025-03-10 12:44:02 monitor.(*QmpMonitor).read(qmp.go:235)] QMP Disconnected middleware(4947c82a-1943-4456-8e21-23e53db6e079): read tcp 127.0.0.1:51670->127.0.0.1:56103: read: connection reset by peer
[error 2025-03-10 12:44:02 hostutils.TaskFailed(hostutils.go:94)] Reqeuest task failed missing task id, with reason(Async start server failed: read tcp 127.0.0.1:51670->127.0.0.1:56103: read: connection reset by peer)
[info 2025-03-11 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-12 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
Post "https://default-region:30888/hosts/32bc16ab-0e62-422c-8a79-2b9bcfe27094/ping": dial tcp 10.102.70.34:30888: connect: connection refused
[error 2025-03-12 04:24:07 hostpinger.(*SHostPingTask).Start(hostpinger.go:92)] host ping failed ping: {"error":{"class":"ServiceAbnormal","code":499,"data":{"fields":["compute"],"id":"%s service is abnormal, please check service status"},"details":"compute service is abnormal, please check service status"}}
Post "https://default-region:30888/hosts/32bc16ab-0e62-422c-8a79-2b9bcfe27094/ping": dial tcp 10.102.70.34:30888: connect: connection refused
[error 2025-03-12 04:24:28 hostpinger.(*SHostPingTask).Start(hostpinger.go:92)] host ping failed ping: {"error":{"class":"ServiceAbnormal","code":499,"data":{"fields":["compute"],"id":"%s service is abnormal, please check service status"},"details":"compute service is abnormal, please check service status"}}
[info 2025-03-13 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-14 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-15 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-16 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-17 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-18 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12
[info 2025-03-19 03:00:00 storageman.(*SLocalImageCacheManager).CleanImageCachefiles(imagecachemanager_local.go:212)] SLocalImageCacheManager /opt/cloud/workspace/disks/image_cache total size 9445MB storage 4767975MB ratio 0.001981 expect ratio 12

@chenjacken
Copy link
Author

我看到有一个比较重要的信息:
[error 2025-03-19 06:13:00 hostinfo.(*SHostInfo).PutHostOnline(hostinfo.go:1659)] Host sys error: map[storages:[{storages 41d5f810-3b02-4fc0-8e34-f280af70af1f Ceph-SSD check storage accessible failed: output: stderr "unable to parse addrs in '[]'\n2025-03-19T06:13:00.102+0000 7fbd43108700 -1 monclient: get_monmap_and_config cannot identify monitors to contact\n[errno 22] RADOS invalid argument (error connecting to the cluster)\n": exit status 1 2025-03-19 06:13:00.105624603 +0000 UTC m=+114.656992044}]]

完整logs:

[info 2025-03-19 06:13:00 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56101 success
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}, "capabilities": ["oob"]}}
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).ImportServer(qemu-kvm.go:973)] Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40) is running, pending_delete=false
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).onMonitorConnected(qemu-kvm.go:1312)] Monitor connected ...
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"qmp_capabilities"}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"query-version"}
[info 2025-03-19 06:13:00 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56102 success
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}, "capabilities": ["oob"]}}
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).onMonitorConnected(qemu-kvm.go:1312)] Monitor connected ...
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": {}}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"qmp_capabilities"}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"query-version"}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}}
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).onGetQemuVersion(qemu-kvm.go:1382)] Guest(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b) qemu version 4.2.0
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": {}}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}}
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).onGetQemuVersion(qemu-kvm.go:1382)] Guest(1202ef25-0a74-4fd2-8703-118b26b99d40) qemu version 4.2.0
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": "VM status: running\r\n"}
[info 2025-03-19 06:13:00 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1489)]43395b9d-2448-4d3b-823a-ccd4d1f6aa1b: onConfirmRunning status running
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).detachStartupTask(qemu-kvm.go:1830)]43395b9d-2448-4d3b-823a-ccd4d1f6aa1b: detachStartupTask
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"execute":"query-block-jobs"}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": "VM status: running\r\n"}
[info 2025-03-19 06:13:00 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1489)]1202ef25-0a74-4fd2-8703-118b26b99d40: onConfirmRunning status running
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).detachStartupTask(qemu-kvm.go:1830)]1202ef25-0a74-4fd2-8703-118b26b99d40: detachStartupTask
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"execute":"query-block-jobs"}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read ser370540797680(43395b9d-2448-4d3b-823a-ccd4d1f6aa1b): {"return": []}
[info 2025-03-19 06:13:00 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read Win2016-MGR-1(1202ef25-0a74-4fd2-8703-118b26b99d40): {"return": []}
[info 2025-03-19 06:13:00 guestman.(*SGuestManager).OnLoadExistingGuestsComplete(guestman.go:333)] Load existing guests complete...
[error 2025-03-19 06:13:00 hostinfo.(*SHostInfo).PutHostOnline(hostinfo.go:1659)] Host sys error: map[storages:[{storages 41d5f810-3b02-4fc0-8e34-f280af70af1f Ceph-SSD check storage accessible failed: output: stderr "unable to parse addrs in '[]'\n2025-03-19T06:13:00.102+0000 7fbd43108700 -1 monclient: get_monmap_and_config cannot identify monitors to contact\n[errno 22] RADOS invalid argument (error connecting to the cluster)\n": exit status 1 2025-03-19 06:13:00.105624603 +0000 UTC m=+114.656992044}]]
[info 2025-03-19 06:13:00 hostpinger.(*SHostPingTask).Start(hostpinger.go:81)] Start host pinger ...
[info 2025-03-19 06:13:00 app.ServeForeverExtended(app.go:60)] Start listen on https://0.0.0.0:8885, isMaster: true
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).allocGuestNumaCpuset(qemu-kvm.go:2922)] alloc numa cpus map[0:{[18 40 42 54 56 10 12 15 57 44 45 52 11 43 46 47 50 59 3 9 19 41 51 55 4 5 16 14 17 0 6 8 13 48 49 53 58 1 2 7] 0 false} 1:{[29 32 33 30 66 72 74 76 27 37 65 73 36 61 68 22 26 31 35 62 71 75 21 23 34 60 24 63 70 77 25 38 67 79 69 78 20 28 39 64] 0 false}]
[info 2025-03-19 06:13:00 hostinfo.(*SHostInfo).OnCatalogChanged(hostinfo.go:2411)] telegraf configuration change, to reload ...
[info 2025-03-19 06:13:00 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:847)] Use vnc port 3
[warning 2025-03-19 06:13:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:1120)] Guest 4947c82a-1943-4456-8e21-23e53db6e079 start monitor failed, can't get qmp monitor port or monitor path
[warning 2025-03-19 06:13:01 guestman.(*SKVMGuestInstance).scriptStart(qemu-kvm.go:2371)] Guest middleware(4947c82a-1943-4456-8e21-23e53db6e079) waiting monitor connect
[info 2025-03-19 06:13:01 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56103 success
[info 2025-03-19 06:13:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:887)] VM started middleware(4947c82a-1943-4456-8e21-23e53db6e079) ...
[info 2025-03-19 06:13:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:893)] Async start server middleware(4947c82a-1943-4456-8e21-23e53db6e079) success!
[info 2025-03-19 06:13:02 monitor.(*QmpMonitor).read(qmp.go:232)] Scan over middleware(4947c82a-1943-4456-8e21-23e53db6e079) ...
[info 2025-03-19 06:13:02 monitor.(*QmpMonitor).read(qmp.go:235)] QMP Disconnected middleware(4947c82a-1943-4456-8e21-23e53db6e079): read tcp 127.0.0.1:39634->127.0.0.1:56103: read: connection reset by peer
[error 2025-03-19 06:13:02 hostutils.TaskFailed(hostutils.go:94)] Reqeuest task failed missing task id, with reason(Async start server failed: read tcp 127.0.0.1:39634->127.0.0.1:56103: read: connection reset by peer)

@wanyaoqi
Copy link
Member

看起来是这台宿主机连不上ceph ,你在宿主机上手动连接ceph看看

@chenjacken
Copy link
Author

收到,我重装下Ceph的客户端。
另外,共享存储会选择一台宿主机统一做磁盘镜像缓存可以修改配置指定该宿主机吗?

@chenjacken
Copy link
Author

在该宿主机上ceph -s是正常的,但是该host日志还是会有这样的错误。

[error 2025-03-19 07:56:35 hostinfo.(*SHostInfo).PutHostOnline(hostinfo.go:1659)] Host sys error: map[storages:[{storages 41d5f810-3b02-4fc0-8e34-f280af70af1f Ceph-SSD check storage accessible failed: output: stderr "unable to parse addrs in '[]'\n2025-03-19T07:56:34.783+0000 7f5708881700 -1 monclient: get_monmap_and_config cannot identify monitors to contact\n[errno 22] RADOS invalid argument (error connecting to the cluster)\n": exit status 1 2025-03-19 07:56:34.786149078 +0000 UTC m=+112.243867090}]]

@wanyaoqi
Copy link
Member

执行 ceph df 看下,应该是这个命令报错了

@chenjacken
Copy link
Author

执行 ceph df 看下,应该是这个命令报错了

信息显示如下:

[root@ser-a1-5 ~]# ceph df 
--- RAW STORAGE ---
CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
hdd    437 TiB  437 TiB  23 GiB    23 GiB          0
ssd     22 TiB   22 TiB  15 GiB    15 GiB       0.07
TOTAL  459 TiB  459 TiB  38 GiB    38 GiB          0
 
--- POOLS ---
POOL                          ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics          1     1      0 B       84      0 B      0    145 TiB
ssdpool                        2  4096  2.3 GiB      642  6.9 GiB   0.03    7.1 TiB
hddpool                        3  4096     19 B        1   12 KiB      0    138 TiB
.rgw.root                      4     8  4.5 KiB       16  180 KiB      0    145 TiB
oss-store.rgw.buckets.index    5     8      0 B       22      0 B      0    145 TiB
oss-store.rgw.otp              6     8      0 B        0      0 B      0    145 TiB
oss-store.rgw.log              7     8   23 KiB      342  1.9 MiB      0    145 TiB
oss-store.rgw.control          8     8      0 B        8      0 B      0    145 TiB
oss-store.rgw.meta             9     8  3.3 KiB       19  192 KiB      0    145 TiB
oss-store.rgw.buckets.non-ec  10     8      0 B        0      0 B      0    145 TiB
oss-store.rgw.buckets.data    11   128   42 MiB      473  130 MiB      0    145 TiB
[root@ser-a1-5 ~]# 

@wanyaoqi
Copy link
Member

output: stderr "unable to parse addrs in '[]'

看报错感觉像是传入的 ceph配置有问题,可以检查一下

@chenjacken
Copy link
Author

output: stderr "unable to parse addrs in '[]'

看报错感觉像是传入的 ceph配置有问题,可以检查一下

确实是,我在web删除了,然后新建ceph,就可以了。

@chenjacken
Copy link
Author

新建虚拟机时,选择CEPH作为磁盘,新建进度状态变成“部署失败”,会报错,然后同步状态后虚拟机事“关机”状态,然后启动虚拟机可以正常启动。
报错信息如下:

{
    "__reason__": "Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = Connect: failed start guest unable to get monitor info from DNS SRV with service name: ceph-mon\nqemu-system-x86_64: -drive file=rbd:ssdpool/8125da14-d132-44cf-8d83-3d45a3c642cb,if=none,id=drive_0,cache=none: error connecting: No such file or directory\n: exit status 1",
    "__stage__": "OnDeployGuestComplete",
    "__status__": "error"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested state/awaiting processing
Projects
None yet
Development

No branches or pull requests

2 participants