Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: cxl-region-sysfs.sh: line 49: echo: write error: Numerical result out of range #278

Open
ikitayama opened this issue Feb 10, 2025 · 13 comments

Comments

@ikitayama
Copy link

ikitayama commented Feb 10, 2025

$ meson test cxl-region-sysfs.sh fails with the messages below from testlog.txt file:

[...]
+ uuidgen
+ nr_targets=8
+ echo 8
++ cat /sys/bus/cxl/devices/decoder2.3/interleave_granularity
+ r_ig=4096
+ echo 4096
+ echo 2147483648
/home/realm/projects/ndctl/test/cxl-region-sysfs.sh: line 49: echo: write error: Numerical result out of range
++ err 49
+++ basename /home/realm/projects/ndctl/test/cxl-region-sysfs.sh
++ echo test/cxl-region-sysfs.sh: failed at line 49
++ '[' -n '' ']'
++ exit 1
[...]

The kernel and cxl_test module is built from cxl-next:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/

ndctl version is HEAD.

@marc-hb
Copy link

marc-hb commented Feb 11, 2025

I think I know why, stay tuned. Will share more in the next few days.

@ikitayama
Copy link
Author

@marc-hb thanks! cxl-create-region.sh , cxl-xor-region.sh , cxl-qos-class.sh , and cxl-poison.sh too are affected?

@ikitayama
Copy link
Author

@marc-hb I couldn't understand well why echo'ing 2GB to the sysfs size entry ends up with an out of range, but smaller values fails with invalid argument. Writing 0 seems to work but no meaning at all. The string to integer conversion shouldn't be an issue should it?

@marc-hb
Copy link

marc-hb commented Feb 25, 2025

Sorry I got distracted. I just replied on the list:
https://lore.kernel.org/linux-cxl/43568B03-6832-4EB1-BF46-EF0F176509E2@linux.dev/T/#u

@marc-hb
Copy link

marc-hb commented Mar 6, 2025

https://lore.kernel.org/linux-cxl/43568B03-6832-4EB1-BF46-EF0F176509E2@linux.dev/T/#u

I spent a bit more time on this and realized that kaslr seems 100% x86-specific. So we're probably experiencing different issues, sorry for the noise. That error message seems quite generic actually. You need to collect more logs and information.

@ikitayama
Copy link
Author

@marc-hb thanks for letting me know! I'm still looking at the host physical address allocation code, that ends up searching for free area, provided we only need 8 x 256MB region size, something kernel or QEMU is having issues or both?

@ikitayama
Copy link
Author

This is a follow-up on this Issue, again tested on a CXL capable emulated system laucned from run_qemu.sh.

testlog.txt

@ikitayama
Copy link
Author

ikitayama commented Apr 7, 2025

This is a follow-up on this Issue, again tested on a CXL capable emulated system laucned from run_qemu.sh.

testlog.txt

setup_cxl() bash function is intact while I had to make some editing to the other areas of the run_qemu.sh script, so I will try to look at the kernel which @jic23 maintains.

@marc-hb
Copy link

marc-hb commented Apr 7, 2025

Jonathan replied in pmem/run_qemu#150, you can find his account name there.

A trick to find anyone's account is to look at any commit they authored in any git repo hosted or mirrored on GitHub.

@marc-hb
Copy link

marc-hb commented Apr 7, 2025

testlog.txt

The kernel logs are very likely more useful.

@ikitayama
Copy link
Author

Is this better?

root@localhost:~/ndctl/build# meson test cxl-region-sysfs.sh
ninja: Entering directory `/root/ndctl/build'
[1/55] Generating version.h with a custom command
[ 1168.911645] cxl_region region2: Bypassing cpu_cache_invalidate_memregion() for testing!
[ 1170.025972] calling  cxl_port_init+0x0/0x1000 [cxl_port] @ 1019
[ 1170.026559] initcall cxl_port_init+0x0/0x1000 [cxl_port] returned 0 after 279 usecs
[ 1170.033707] calling  cxl_acpi_init+0x0/0x1000 [cxl_acpi] @ 1019
[ 1170.066885] probe of port1 returned 0 after 23023 usecs
[ 1170.067394]  pci0000:bf: host supports CXL
[ 1170.098168] probe of port2 returned 0 after 29658 usecs  2/600s
[ 1170.098754]  pci0000:35: host supports CXL
[ 1170.100979] probe of ACPI0017:00 returned 0 after 66209 usecs
[ 1170.101537] initcall cxl_acpi_init+0x0/0x1000 [cxl_acpi] returned 0 after 67515 usecs
[ 1170.111145] calling  cxl_pmem_init+0x0/0x1000 [cxl_pmem] @ 1019
[ 1170.132105] probe of ndbus0 returned 0 after 1526 usecs
[ 1170.133293] probe of nvdimm-bridge0 returned 0 after 21130 usecs
[ 1170.134197] initcall cxl_pmem_init+0x0/0x1000 [cxl_pmem] returned 0 after 22250 usecs
[ 1170.137744] calling  cxl_mem_driver_init+0x0/0x1000 [cxl_mem] @ 1019
[ 1170.167006] probe of port3 returned 0 after 27343 usecs
[ 1170.174010] cxl_nvdimm pmem11: GPF: could not set dirty shutdown state
[ 1170.176910] probe of nmem0 returned 0 after 1145 usecs
[ 1170.177519] probe of pmem11 returned 0 after 3926 usecs
[ 1170.206636] probe of endpoint4 returned 0 after 28398 usecs
[ 1170.208604] probe of mem11 returned 0 after 70057 usecs
[ 1170.209787] cxl_nvdimm pmem12: GPF: could not set dirty shutdown state
[ 1170.225910] probe of nmem1 returned 0 after 2603 usecs
[ 1170.226936] probe of pmem12 returned 0 after 17298 usecs
[ 1170.271764] probe of endpoint5 returned 0 after 29547 usecs
[ 1170.272374] probe of mem12 returned 0 after 63332 usecs
[ 1170.277418] initcall cxl_mem_driver_init+0x0/0x1000 [cxl_mem] returned 0 after 138998 usecs
[ 1170.284687] calling  cxl_test_init+0x0/0x1000 [cxl_test] @ 1019
[ 1170.711571] platform cxl_host_bridge.0: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.714631] platform cxl_host_bridge.1: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.715683] platform cxl_host_bridge.2: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.716569] platform cxl_host_bridge.3: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.727275] platform cxl_host_bridge.0: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.757243] probe of port7 returned 0 after 28879 usecs
[ 1170.757940] platform cxl_host_bridge.0: host supports CXL
[ 1170.758165] platform cxl_host_bridge.1: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.760660] probe of port8 returned 0 after 1897 usecs
[ 1170.760917] platform cxl_host_bridge.1: host supports CXL
[ 1170.764436] platform cxl_host_bridge.2: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.783793] probe of port9 returned 0 after 18356 usecs
[ 1170.784508] platform cxl_host_bridge.2: host supports CXL
[ 1170.788123] platform cxl_host_bridge.3: host supports CXL (restricted)
[ 1170.795005] probe of ndbus1 returned 0 after 717 usecs
[ 1170.795767] probe of nvdimm-bridge1 returned 0 after 1758 usecs
[ 1170.798731] probe of cxl_acpi.0 returned 0 after 87790 usecs
[ 1170.824632] initcall cxl_test_init+0x0/0x1000 [cxl_test] returned 0 after 538728 usecs
[ 1170.832593] cxl_mock_mem cxl_mem.0: CXL MCE unsupported
[ 1170.842623] cxl_mock_mem cxl_mem.4: CXL MCE unsupported
[ 1170.851235] probe of port10 returned 0 after 2038 usecs
[ 1170.852016] cxl_mock_mem cxl_mem.2: CXL MCE unsupported
[ 1170.857719] cxl_mock_mem cxl_mem.5: CXL MCE unsupported
[ 1170.864625] cxl_mock_mem cxl_mem.1: CXL MCE unsupported
[ 1170.869519] probe of nmem2 returned 0 after 307 usecs
[ 1170.869550] probe of nmem3 returned 0 after 479 usecs
[ 1170.869676] probe of pmem0 returned 0 after 17586 usecs
[ 1170.870168] probe of pmem2 returned 0 after 13243 usecs
[ 1170.872743] probe of region2 returned 6 after 56 usecs
[ 1170.873131] cxl_mock_mem cxl_mem.4: Extended linear cache calculation failed rc:-2
[ 1170.875226] cxl_mock_mem cxl_mem.3: CXL MCE unsupported
[ 1170.878772] cxl_mock_mem cxl_mem.6: CXL MCE unsupported
[ 1170.893275] cxl_mock_mem cxl_mem.7: CXL MCE unsupported
[ 1170.899496] cxl_mock_mem cxl_mem.8: CXL MCE unsupported
[ 1170.903380] cxl_mock_mem cxl_mem.9: CXL MCE unsupported
[ 1170.907442] cxl_mock_mem cxl_rcd.10: CXL MCE unsupported
[ 1170.919345] probe of endpoint11 returned 0 after 48758 usecs
[ 1170.930629] probe of mem0 returned 0 after 81921 usecs
[ 1170.931396] probe of cxl_mem.4 returned 0 after 112290 usecs
[ 1170.932469] probe of port12 returned 0 after 57455 usecs
[ 1170.937940] Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x0-0x487e67fffffff]
[ 1170.938621] kmem dax2.0: mapping0: 0xfffffff010000000-0xfffffff02fffffff memory add failed
[ 1170.938908] kmem dax2.0: probe with driver kmem failed with error -7
[ 1170.939144] probe of dax2.0 returned 7 after 1368 usecs
[ 1170.948633] probe of endpoint16 returned 0 after 2475 usecs
[ 1170.949625] probe of nmem4 returned 0 after 719 usecs
[ 1170.949745] probe of mem10 returned 0 after 31080 usecs
[ 1170.950285] probe of pmem4 returned 0 after 4223 usecs
[ 1170.950888] probe of cxl_rcd.10 returned 0 after 72239 usecs
[ 1170.952332] probe of dax_region2 returned 0 after 29055 usecs
[ 1170.952724] probe of region2 returned 0 after 29714 usecs
[ 1170.953760] probe of endpoint15 returned 0 after 33541 usecs
[ 1170.954299] probe of mem2 returned 0 after 97812 usecs
[ 1170.955180] probe of cxl_mem.0 returned 0 after 151506 usecs
[ 1170.956418] probe of port14 returned 0 after 18871 usecs
[ 1170.957493] probe of port13 returned 0 after 81125 usecs
[ 1170.959361] probe of nmem5 returned 0 after 170 usecs
[ 1170.959712] probe of pmem6 returned 0 after 1169 usecs
[ 1170.960560] probe of nmem7 returned 0 after 294 usecs
[ 1170.960900] probe of pmem8 returned 0 after 1494 usecs
[ 1170.967410] probe of endpoint18 returned 0 after 11859 usecs
[ 1170.971395] probe of nmem6 returned 0 after 8898 usecs
[ 1170.971724] probe of pmem1 returned 0 after 13147 usecs
[ 1170.973271] probe of mem4 returned 0 after 98654 usecs
[ 1170.973766] probe of endpoint19 returned 0 after 13492 usecs
[ 1170.974100] probe of cxl_mem.1 returned 0 after 165966 usecs
[ 1170.975162] probe of port17 returned 0 after 28467 usecs
[ 1170.990984] probe of mem6 returned 0 after 105441 usecs
[ 1170.991973] probe of cxl_mem.6 returned 0 after 153482 usecs
[ 1170.992518] probe of nmem8 returned 0 after 262 usecs
[ 1170.992912] probe of pmem5 returned 0 after 1528 usecs
[ 1171.008019] probe of endpoint22 returned 0 after 2509 usecs
[ 1171.010738] probe of nmem9 returned 0 after 15358 usecs
[ 1171.017272] probe of pmem7 returned 0 after 24832 usecs
[ 1171.017568] probe of mem5 returned 0 after 141004 usecs
[ 1171.018595] probe of cxl_mem.3 returned 0 after 200129 usecs
[ 1171.019844] probe of nmem10 returned 0 after 214 usecs
[ 1171.020729] probe of pmem3 returned 0 after 23600 usecs
[ 1171.021539] probe of endpoint20 returned 0 after 46805 usecs
[ 1171.029229] probe of mem8 returned 0 after 124606 usecs
[ 1171.030245] probe of cxl_mem.8 returned 0 after 170447 usecs
[ 1171.030867] probe of endpoint23 returned 0 after 12984 usecs
[ 1171.031725] probe of mem7 returned 0 after 127008 usecs
[ 1171.032658] probe of cxl_mem.7 returned 0 after 185239 usecs
[ 1171.036191] probe of endpoint24 returned 0 after 14865 usecs
[ 1171.039651] probe of nmem11 returned 0 after 275 usecs
[ 1171.040281] probe of pmem9 returned 0 after 1676 usecs
[ 1171.053750] probe of endpoint25 returned 0 after 12892 usecs
[ 1171.068773] probe of mem3 returned 0 after 191752 usecs
[ 1171.073524] probe of mem9 returned 0 after 165319 usecs
[ 1171.087152] probe of cxl_mem.5 returned 0 after 254973 usecs00s
[ 1171.112416] probe of cxl_mem.9 returned 0 after 247756 usecs
[ 1171.268521] probe of endpoint21 returned 0 after 266319 usecs
[ 1171.281923] probe of mem1 returned 0 after 405914 usecs
[ 1171.348228] probe of cxl_mem.2 returned 0 after 533905 usecs
[ 1174.275907] probe of region5 returned 6 after 94 usecs   6/600s
1/1 ndctl:cxl / cxl-region-sysfs.sh        FAIL             6.28s   exit status 1
>>> MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 MALLOC_PERTURB_=116 NDCTL=/root/ndctl/build/ndctl/ndctl LD_LIBRARY_PATH=/root/ndctl/build/cxl/lib:/root/ndctl/build/ndctl/lib:/root/ndctl/build/daxctl/lib DATA_PATH=/root/ndctl/test ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 TEST_PATH=/root/ndctl/build/test DAXCTL=/root/ndctl/build/daxctl/daxctl /bin/bash /root/ndctl/test/cxl-region-sysfs.sh


Ok:                 0
Expected Fail:      0
Fail:               1
Unexpected Pass:    0
Skipped:            0
Timeout:            0

Full log written to /root/ndctl/build/meson-logs/testlog.txt

Could @jic23 comment on this too?

@marc-hb
Copy link

marc-hb commented Apr 7, 2025

[ 1170.937940] Hotplug memory [0xfffffff010000000-0xfffffff030000000] exceeds maximum addressable range [0x0-0x487e67fffffff]

Interesting, I get a similar error when x86-specific KASLR is enabled by default. That's because KASLR reduces the maximum range available. You seem to land in a similar place in a different and unrelated way?

@ikitayama
Copy link
Author

ikitayama commented Apr 10, 2025

[...]
[ 1170.758165] platform cxl_host_bridge.1: Unsupported platform config, mixed Virtual Host and Restricted CXL Host hierarchy.
[ 1170.760660] probe of port8 returned 0 after 1897 usecs

@marc-hb @AlisonSchofield do you happen to know how the CXL setup on the q35 machine avoids this situation? The VM launched from the run_qemu.sh script has CEDT, but not sure if it is a valid ACPI table, nor if the kernel (driver) parses it correctly. I never edited the options set in setup_cxl(). QEMU I'm using is the latest development of @jic23. The kernel is the cxl-next the Intel CXL folks maintains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants