Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug:(duplication) master cluster down when duplication checking a private log to load #1596

Closed
ninsmiracle opened this issue Sep 1, 2023 · 1 comment
Labels
type/bug This issue reports a bug.

Comments

@ninsmiracle
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!
Master cluster core dump after duplication task running some times.

  1. What did you do?
    This coredump occured in an online cluster. We begin running a duplication task and after a few days, and some nodes begin coredump.

  2. What did you see instead?
    We can see that some error happened when duplication checking a plog to load(a staget of duplication).

Program terminated with signal 6, Aborted.
#0  0x00007fbfd14e21d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) #0  0x00007fbfd14e21d7 in raise () from /lib64/libc.so.6
#1  0x00007fbfd14e38c8 in abort () from /lib64/libc.so.6
#2  0x00007fbfd65c963e in dsn_coredump ()
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/runtime/service_api_c.cpp:93
#3  0x00007fbfd63c47c3 in dsn::replication::log_file::log_file (
    this=0x52b031600,
    path=0x672f88718 "/home/work/ssd2/pegasus/alsgsrv-monetization-master/replica/reps/8.79.pegasus/plog/log.2889.96902391528", handle=<optimized out>,
    index=<optimized out>, start_offset=96902391528, is_read=<optimized out>)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/replica/log_file.cpp:166
#4  0x00007fbfd63c630e in dsn::replication::log_file::open_read (
    path=0x672f88718 "/home/work/ssd2/pegasus/alsgsrv-monetization-master/replica/reps/8.79.pegasus/plog/log.2889.96902391528", err=...)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/replica/log_file.cpp:92
#5  0x00007fbfd63de83a in dsn::replication::log_utils::open_read (path=...,
    file=...)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/replica/mutation_log_utils.cpp:43
#6  0x00007fbfd649feea in dsn::replication::load_from_private_log::find_log_file_to_start (this=this@entry=0x25734b7c0)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:121
#7  0x00007fbfd64a0a40 in dsn::replication::load_from_private_log::run (
    this=0x25734b7c0)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:100   
#8  0x00007fbfd6606631 in dsn::task::exec_internal (
    this=this@entry=0x13a800d950)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/runtime/task/task.cpp:176
#9  0x00007fbfd661bce2 in dsn::task_worker::loop (this=0x3535760)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/runtime/task/task_worker.cpp:224
#10 0x00007fbfd661be60 in dsn::task_worker::run_internal (this=0x3535760)
    at /home/jiashuo1/work/incubator-pegasus/src/rdsn/src/runtime/task/task_worker.cpp:204
#11 0x00007fbfd529aa2f in execute_native_thread_routine ()
   from /home/work/app/pegasus/alsgsrv-monetization-master/replica/package/bin/libdsn_utils.so
#12 0x00007fbfd30a5dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007fbfd15a473d in clone () from /lib64/libc.so.6
(gdb) quit
  1. What version of Pegasus are you using?
    pegasus2.4
@ninsmiracle ninsmiracle added the type/bug This issue reports a bug. label Sep 1, 2023
empiredan pushed a commit that referenced this issue Mar 28, 2024
…hey are being checked by duplication (#1597)

#1596

Using an atomic member to prevent plog files from being removed by GC
when the plog files are being checked by duplication.
@empiredan
Copy link
Contributor

This issue has been fixed by #1597.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

2 participants