Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libomptrace segfaults when activating papi counters #125

Open
lucjaulmes opened this issue Oct 18, 2024 · 3 comments
Open

libomptrace segfaults when activating papi counters #125

lucjaulmes opened this issue Oct 18, 2024 · 3 comments

Comments

@lucjaulmes
Copy link

lucjaulmes commented Oct 18, 2024

Running Ubuntu 23.10 kernel 6.5.0-44 on Intel Xeon Gold 6230R (cascade lake), I’ve compiled extrae 4.2.3 and linked against apt-provided libpapi 7.0 as well as self-compiled libpapi 7.1, using gcc 13.2.0 and libgomp.

I’ve tested that:

  • this happens on all tested and even trivial openmp programs
  • libseqtrace seems to not have the same issue and creates a trace with counter values
  • papi works fine on its own, without instrumentation

Note that it doesn’t fail on single thread executions but fails as soon as 2 threads appear. The segfault appears to happen inside ioctl() which is called by PAPI_add_event internals, and the offending address is stack pointer ($rsp) - 8.

Would appreciate any help you can give me in debugging/avoiding this crash.


Here’s Extrae’s configure summary:

Package configuration for Extrae 4.2.3
-----------------------
Installation prefix: /home/ljaulmes/.local
Cross compilation: no
CC: gcc
CXX: g++
Binary type: 64 bits

MPI instrumentation: no
GASPI instrumentation: no
OpenMP instrumentation: yes, through LD_PRELOAD
    GNU OpenMP: yes
    IBM OpenMP: no
    Intel OpenMP: yes
    OMPT: yes
OpenSHMEM instrumentation: no
pThread instrumentation: yes
    Support for pthread_barrier_wait: yes
    Support for pthread_cond_* calls: yes
CUDA instrumentation: no
OpenCL instrumentation: no
OPENACC instrumentation: no
Java instrumentation: unsupported

Performance counters: yes
    Performance API: PAPI
    PAPI home: /usr
    Sampling support: yes

PEBS sampling: yes

libbfd available: yes (/usr/lib/x86_64-linux-gnu)
libiberty available: yes (/usr/lib/x86_64-linux-gnu)
zlib available: yes (/usr/local)
libxml2 available: yes (/usr)
BOOST available: no
callstack access: through libunwind (/usr)

Dynamic instrumentation: no

Optional features:
------------------
On-line analysis: no
Clock routine: POSIX / clock_gettime, but don't need to link against posix clock library explicitly
Heterogeneous support: no
Parallel merge: not available as MPI is not given
``̀`

Here’s a simple example:
``̀`c
#include <stdio.h>
#include <omp.h>

int main(void) {
    #pragma omp parallel
    {
        int thread_id = omp_get_thread_num();
        printf("Hello from process: %d\n", thread_id);
    }

    return 0;
}

config file:

<?xml version='1.0'?>

<trace enabled="yes"
 home="$EXTRAE_HOME$"
 initial-mode="detail"
 type="paraver"
>

  <openmp enabled="yes" ompt="no">
  </openmp>

  <counters enabled="yes">
    <cpu enabled="yes" starting-set-distribution="1">
                <set enabled="yes" domain="all">
                        PAPI_TOT_CYC
                </set>
    </cpu>
  </counters>

</trace>

What I run:

gcc -fopenmp    test.c   -o test
env OMP_NUM_THREADS=${THREADS} EXTRAE_HOME=${EXTRAE_HOME} LD_PRELOAD=${EXTRAE_HOME}/lib/libomptrace.so EXTRAE_CONFIG_FILE=${PWD}/extrae.xml ./test

Output:

Extrae: WARNING! omp_get_thread_num_real is a NULL pointer. Did the initialization of this module trigger? Retrying initialization...
Welcome to Extrae 4.2.3
Extrae: Detected GOMP version is 4.5
Extrae: Detected and hooked OpenMP runtime: [GNU GOMP]
Extrae: OMP_NUM_THREADS set to 2
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) begins
Extrae: Tracing package is located on /home/ljaulmes/.local
Extrae: Generating intermediate files for Paraver traces.
Extrae: PAPI domain set to ALL for HWC set 1
Extrae: HWC set 1 contains following counters < PAPI_TOT_CYC (0x8000003b) > - never changes
Extrae: Dynamic memory instrumentation is disabled.
Extrae: Basic I/O memory instrumentation is disabled.
Extrae: System calls instrumentation is disabled.
Extrae: Parsing the configuration file (/home/ljaulmes/tests/openmp/extrae.xml) has ended
Extrae: Intermediate traces will be stored in /home/ljaulmes/tests/openmp
Extrae: Tracing mode is set to: Detail.
Extrae: Successfully initiated with 1 tasks and 2 threads

Hello from process: 0
Segmentation fault (core dumped)

This is the gdb info:

Thread 2 "test" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe52f96c0 (LWP 3729330)]
0x00007ffff7f16895 in check_scheduability (ctl=ctl@entry=0x7fffe0000d80, idx=idx@entry=0, ctx=<optimized out>) at components/perf_event/perf_event.c:526
526    retval = ioctl( group_leader_fd, PERF_EVENT_IOC_RESET, 0);
(gdb) bt
#0  0x00007ffff7f16895 in check_scheduability (ctl=ctl@entry=0x7fffe0000d80, idx=idx@entry=0, ctx=<optimized out>) at components/perf_event/perf_event.c:526
#1  0x00007ffff7f16de3 in open_pe_events (ctl=0x7fffe0000d80, ctx=<optimized out>) at components/perf_event/perf_event.c:814
#2  _pe_update_control_state (ctl=0x7fffe0000d80, native=<optimized out>, count=<optimized out>, ctx=<optimized out>) at components/perf_event/perf_event.c:1660
#3  0x00007ffff7f02d73 in add_native_events (ESI=ESI@entry=0x7fffe0000c80, nevt=nevt@entry=0x7ffff7f53448 <_papi_hwi_presets+11848>, size=<optimized out>, 
    out=0x7fffe00151e0) at papi_internal.c:1297
#4  0x00007ffff7f04d41 in _papi_hwi_add_event (ESI=0x7fffe0000c80, EventCode=<optimized out>, EventCode@entry=-2147483589) at papi_internal.c:1415
#5  0x00007ffff7efd49d in PAPI_add_event (EventSet=<optimized out>, EventCode=-2147483589) at papi.c:2243
#6  0x00007ffff766b572 in HWCBE_PAPI_Init_Thread (time=5035646768046324, threadid=threadid@entry=1, forked=forked@entry=0)
    at ../../../../src/tracer/hwc/papi_hwc.c:612
#7  0x00007ffff7669394 in HWC_Read (tid=tid@entry=1, time=<optimized out>, store_buffer=store_buffer@entry=0x7fffe4af8b48)
    at ../../../../src/tracer/hwc/common_hwc.c:574
#8  0x00007ffff767c625 in Probe_OpenMP_UF_Entry (uf=uf@entry=93824992235958) at ../../../../../src/tracer/wrappers/OMP/omp-probe.c:88
#9  0x00007ffff7681a07 in Extrae_OpenMP_UF_Entry (uf=0x5555555551b6 <main._omp_fn>) at ../../../../../src/tracer/wrappers/OMP/omp-events.c:63
#10 0x00007ffff768238a in callme_par (par_helper_ptr=par_helper_ptr@entry=0x7fffffffe6b0) at ../../../../../src/tracer/wrappers/OMP/gnu-libgomp.c:448
#11 0x00007ffff7f8248e in gomp_thread_start (xdata=<optimized out>) at ../../../src/libgomp/team.c:129
#12 0x00007ffff7297b5a in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#13 0x00007ffff73285fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) info registers
rax            0x0                 0
rbx            0xa                 10
rcx            0x7ffff7325cfd      140737340660989
rdx            0x0                 0
rsi            0x2403              9219
rdi            0xa                 10
rbp            0x7fffe0000de8      0x7fffe0000de8
rsp            0x7fffe4af6f70      0x7fffe4af6f70
r8             0x0                 0
r9             0x0                 0
r10            0xffffffff          4294967295
r11            0x246               582
r12            0x0                 0
r13            0x7fffe0000d80      140736951487872
r14            0x0                 0
r15            0x1                 1
rip            0x7ffff7f16895      0x7ffff7f16895 <check_scheduability+85>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x1c00              7168
k1             0x0                 0
k2             0xefffffff          4026531839
k3             0x0                 0
k4             0x0                 0
k5             0x0                 0
k6             0xb80170            12058992
k7             0x0                 0
fs_base        0x7fffe52f96c0      140737038489280
gs_base        0x0                 0
(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {-458264728, 32767, 0 <repeats 26 times>}, _kill = {si_pid = -458264728, si_uid = 32767}, 
    _timer = {si_tid = -458264728, si_overrun = 32767, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = -458264728, si_uid = 32767, si_sigval = {
        sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = -458264728, si_uid = 32767, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
      si_addr = 0x7fffe4af6f68, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = 140737030090600, si_fd = 0}, _sigsys = {
      _call_addr = 0x7fffe4af6f68, _syscall = 0, _arch = 0}}}
(gdb) p $_siginfo.si_signo
$5 = 11
(gdb) p $_siginfo.si_errno
$6 = 0
(gdb) p $_siginfo._sifields._sigfault 
$7 = {si_addr = 0x7fffe4af6f68, _addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}

Let me know any other info you need.

@lucjaulmes
Copy link
Author

I tried Extrae 3.8.3 which did not have a crash, so went ahead and ran a git bisect. First bad commit appears to be 0df3e97. Reverting that commit on top of v4.2.3 fixes the issue.

@julianmorillo
Copy link
Contributor

julianmorillo commented Nov 13, 2024

Just to mention that I had the very same problem with libptcudampi.so (Extrae v4.2.3) and the solution proposed by @lucjaulmes worked for me too.

@valentin-seitz
Copy link

valentin-seitz commented Nov 21, 2024

This problem is also reproducible in the Marenostrum 5 installation available via the module extrae/4.2.3. When tracing with OpenMP and PAPI Counters extrae segfaults.
When reverting the commit 0df3e97 and doing my own installation I can successfully trace OpenMP with PAPI counters.
See #127 for a revert commit that solves the issue for me on Marenostrum 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants