Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icinga2 may be removed from Debian armel, mips64el, ppc64el, and riscv64 #9954

Open
noloader opened this issue Jan 3, 2024 · 8 comments
Open

Comments

@noloader
Copy link

noloader commented Jan 3, 2024

Describe the bug

icinga2 may be removed from Debian armel, mips64el, ppc64el, and riscv64 due to Failure to Build from Source (FTBFS).

To Reproduce

  • armel:

/<>/lib/base/application.hpp:148:33: error: static assertion failed

https://buildd.debian.org/status/fetch.php?pkg=icinga2&arch=armel&ver=2.14.1-1&stamp=1703225119&raw=0

  • mips64el:

152 - base-methods_pluginnotificationtask/truncate_long_output (Failed)

https://buildd.debian.org/status/fetch.php?pkg=icinga2&arch=mips64el&ver=2.14.1-1&stamp=1704107092&raw=0

  • ppc64el:

152 - base-methods_pluginnotificationtask/truncate_long_output (Failed)

https://buildd.debian.org/status/fetch.php?pkg=icinga2&arch=ppc64el&ver=2.14.1-1&stamp=1703225525&raw=0

  • riscv64:

/usr/bin/ld: /usr/lib/riscv64-linux-gnu/libboost_coroutine.so.1.83.0: undefined reference to jump_fcontext' /usr/bin/ld: /usr/lib/riscv64-linux-gnu/libboost_coroutine.so.1.83.0: undefined reference to make_fcontext'

https://buildd.debian.org/status/fetch.php?pkg=icinga2&arch=riscv64&ver=2.14.1-1&stamp=1703250691&raw=0

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version):
  • Operating System and version:
  • Enabled features (icinga2 feature list):
  • Icinga Web 2 version and modules (System - About):
  • Config validation (icinga2 daemon -C):
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

Additional context

I just lurk on some of the Debian mailing lists. I wanted to give the project a heads up. Also see https://lists.debian.org/debian-powerpc/2024/01/msg00001.html.

@julianbrost
Copy link
Contributor

  • armel:

/<>/lib/base/application.hpp:148:33: error: static assertion failed

buildd.debian.org/status/fetch.php?pkg=icinga2&arch=armel&ver=2.14.1-1&stamp=1703225119&raw=0

So looks like no lock-free implementation of std::atomic<uint32_t> available on that platform:

typedef Atomic<std::conditional_t<Atomic<double>::is_always_lock_free, double, uint32_t>> AtomicTs;
static_assert(AtomicTs::is_always_lock_free);

If someone wants to make it work, go ahead, but I'm not sure if it's worth the time for some old ARM systems.

  • mips64el:

152 - base-methods_pluginnotificationtask/truncate_long_output (Failed)

buildd.debian.org/status/fetch.php?pkg=icinga2&arch=mips64el&ver=2.14.1-1&stamp=1704107092&raw=0

  • ppc64el:

152 - base-methods_pluginnotificationtask/truncate_long_output (Failed)

buildd.debian.org/status/fetch.php?pkg=icinga2&arch=ppc64el&ver=2.14.1-1&stamp=1703225525&raw=0

That was recently added in #9887. Not sure what's going wrong on these architectures. I don't think we have easy access to such machines so help for debugging this is greatly appreciated.

  • riscv64:

/usr/bin/ld: /usr/lib/riscv64-linux-gnu/libboost_coroutine.so.1.83.0: undefined reference to jump_fcontext' /usr/bin/ld: /usr/lib/riscv64-linux-gnu/libboost_coroutine.so.1.83.0: undefined reference to make_fcontext'

buildd.debian.org/status/fetch.php?pkg=icinga2&arch=riscv64&ver=2.14.1-1&stamp=1703250691&raw=0

That actually seems to be an issue in Boost which, according to Debian bug #1059046, should already be fixed there.

@noloader
Copy link
Author

noloader commented Jan 3, 2024

Thanks @julianbrost,

Just one comment...

base-methods_pluginnotificationtask/truncate_long_output (Failed)

That was recently added in #9887. Not sure what's going wrong on these architectures. I don't think we have easy access to such machines so help for debugging this is greatly appreciated.

It will probably be easiest to use a KVM/QEMU virtual machine, though emulation will be a little slow. I endure it for a couple of projects I help with, like when I need a s390x machine.

The Gnulib folks recently added instructions for QEMU at https://git.savannah.gnu.org/gitweb/?p=gnulib/maint-tools.git;a=tree;f=platforms/environments/qemu. Once you setup a Debian based machine, switch from Bookworm/Stable to Unstable, then apt-get update && apt-get upgrade. That should get you to a similar machine that exhibits the issue.

And if you don't have time, then I understand that, too. I understand managing a project takes a lot of time.

@aurel32
Copy link

aurel32 commented Jan 3, 2024

Thanks for pointing at the commit that introduced the issue. Looking at it and the affected architectures, it could be linked to the fact that the page size on those architectures is not 4K.

@aurel32
Copy link

aurel32 commented Jan 3, 2024

The issue seems to be an confusion between MAX_ARG_STRLEN, which represents the maximum length of a single argument, and syconf(_SC_ARG_MAX), which is the maximum length of the command. Note that the maximum length defaults ot 1/4 of the stack size, and can therefore be smaller than MAX_ARG_STRLEN. This is the case on ppc64el which has a page size of 64kB. Therefore the fix for the plugin would be something like:

--- icinga2-2.14.1.orig/lib/methods/pluginnotificationtask.cpp
+++ icinga2-2.14.1/lib/methods/pluginnotificationtask.cpp
@@ -13,17 +13,12 @@
 #include "base/convert.hpp"

 #ifdef __linux__
-#      include <linux/binfmts.h>
 #      include <unistd.h>
-
-#      ifndef PAGE_SIZE
-//             MAX_ARG_STRLEN is a multiple of PAGE_SIZE which is missing
-#              define PAGE_SIZE getpagesize()
-#      endif /* PAGE_SIZE */
+#      define ARG_MAX sysconf(_SC_ARG_MAX)

 // Make e.g. the $host.output$ itself even 10% shorter to leave enough room
 // for e.g. --host-output= as in --host-output=$host.output$, but without int overflow
-const static auto l_MaxOutLen = MAX_ARG_STRLEN - MAX_ARG_STRLEN / 10u;
+const static auto l_MaxOutLen = ARG_MAX - ARG_MAX / 10u;
 #endif /* __linux__ */

 using namespace icinga;

Then the second issue, is that the test hard codes the way to trigger the issue by using a 1024*1024 size.

@noloader
Copy link
Author

noloader commented Jan 3, 2024

@aurel32,

That looks like a good patch for Debian to carry until the Icinga folks determine their path forward. It will clear FTBFS on two platforms. And coupled with the Boost fix that @julianbrost provided, that takes the total to three platforms.

I hate to see armel and armv7 left behind, but I think its good progress none the less.

@aurel32
Copy link

aurel32 commented Jan 3, 2024

The patch is only part of the fix, just the the plugin, not the testsuite. Also playing more with this, I noticed the problem is more complicated.

E2BIG happens when:

  • One argument is bigger than MAX_ARG_STRLEN
  • All arguments + environment is bigger than sysconf(_SC_ARG_MAX)

The default value of sysconf(_SC_ARG_MAX) on all modern Linux systems seems to be 2MiB. The value of MAX_ARG_STRLEN, changes with the page size, so it's 128KiB on systems with 4KiB pages (e.g. x86), 512KiB on systems with 16KiB pages (e.g. some mips systems), and 2MiB on systems with 64KiB pages (e.g. ppc64el).

So on an x86 system you can pass up to 16 arguments of length MAX_ARG_STRLEN, while on ppc64el, you can pass only one.

The current code and testsuite does not look at the number of arguments and assumes that MAX_ARG_STRLEN is the only limit.

@Al2Klimov
Copy link
Member

So looks like no lock-free implementation of std::atomic<uint32_t> available on that platform:

typedef Atomic<std::conditional_t<Atomic<double>::is_always_lock_free, double, uint32_t>> AtomicTs;
static_assert(AtomicTs::is_always_lock_free);

If someone wants to make it work, go ahead, but I'm not sure if it's worth the time for some old ARM systems.

What about locking instead of requiring a lock-free number? https://github.com/Icinga/icinga2/pull/8430/files#diff-f60819a3bc65b59dde8810c55c02f627b457f605fd27445e523fbd44618c1f10R157

@julianbrost
Copy link
Contributor

What about locking instead of requiring a lock-free number? #8430 (files)

The nice thing about atomic operations is that they can't block indefinitely. Using shared memory locking instead just adds a lot more room for error, especially given that we do so nice things like the following:

_exit(rc); // Yay, our static destructors are pretty much beyond repair at this point.

So if that would happen while that process held the lock, it would remain locked indefinitely.

Is there really a lack of 32 bit atomic operations on these architectures or is it just that those are not implemented in std::atomic for these platforms or that they have additional memory alignment requirements so that the always part of the check is the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants