Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pantheon: Random crashes since upgrading to 23.11 #274999

Closed
OPNA2608 opened this issue Dec 17, 2023 · 8 comments
Closed

Pantheon: Random crashes since upgrading to 23.11 #274999

OPNA2608 opened this issue Dec 17, 2023 · 8 comments
Labels
0.kind: bug Something is broken 6.topic: pantheon The Pantheon desktop environment

Comments

@OPNA2608
Copy link
Contributor

OPNA2608 commented Dec 17, 2023

Describe the bug

After upgrading from 23.05 to 23.11, the rate of Pantheon crashes has gone from basically-never to once every few days. Some days I don't have any, other days it's 2-3 crashes.

I unfortunately can't pinpoint any particular situation that reproduces these crashes though. Here are some situations where I remember a crash happening afterwards:

  • Switching workspaces
  • Alt-tabbing between Discord and Element
  • Entering fullscreen mode on a YouTube video in Firefox

...but I cannot force a crash by aggressively spamming these actions.

Steps To Reproduce

Unsure, beyond "Seemingly normal Pantheon usage on 23.11".

Expected behavior

No crashes.

Screenshots

n/a, just the regular Pantheon "An error occurred, please log out" screen after it happens.

Additional context

I'm not 100% certain this is really a Pantheon issue. Pantheon itself seems to look & behave fine for hours without any graphics issues. but the apps that I remember using before crashes are all GPU-accelerated.

I've been busy with bisecting my way through our git history for the last week or so, to find the cause of severe graphics issues & corruption under Miriway after upgrading from 23.05 to 23.11. Is it possible this is really a GPU driver issue? For now, all I can say about this issue is that gala is crashing sometimes.

If it turns out to be hardware/GPU-specific, I'm using a Radeon RX 5700 XT.

Any advice on how to debug the crashes would be appreciated, in case this is completely unreproducible on your end.

Notify maintainers

Pantheon maintainers for now: @davidak @bobby285271

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.66-xanmod1, NixOS, 23.11 (Tapir), 23.11.1779.cf28ee258fd5`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(bt1cn): `"unstable"`
 - channels(root): `"nixos-23.11"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a 👍 reaction to issues you find important.

@OPNA2608 OPNA2608 added the 0.kind: bug Something is broken label Dec 17, 2023
@chewblacka chewblacka added the 6.topic: pantheon The Pantheon desktop environment label Dec 18, 2023
@OPNA2608
Copy link
Contributor Author

OPNA2608 commented Jan 5, 2024

Some situations I have now spotted, after further use. I'm not sure if they're actually new issues caused by the bump, or really 23.05 issues re-spotted after messing with my settings / trying out other packages again.

  • All variants of eclipses.* crash the compositor 100% of the time when launched
  • Notification pop-ups have the possibility of crashing the compositor, but this doesn't always happen. I've seen this happen in the following situations:
    • while watching a YouTube video in fullscreen / in the process of toggling fullscreen mode
    • while playing a game in fullscreen
    • switching workspaces

@OPNA2608
Copy link
Contributor Author

OPNA2608 commented Jan 7, 2024

In addition to the above, the crashes from the OP came back yesterday. After getting 6 crashes in a row doing the same things post-login during a pair programming meetup, I had to give up on my graphical setup. I'll just list what I did after logging in:

  • launch protonmail-bridge in kitty
  • launch thunderbird
  • switch workspace to the right
  • launch element-desktop
  • launch discord and join a call
  • press alt-tab
  • crash

Any suggestion for what I could do to provide more details would still be appreciated, because I know this isn't super helpful so far. Syslogs don't seem to have any details as-is, just a message that gala crashes along with the normal assertion failures.

@bobby285271
Copy link
Member

Honestly I don't think I can actually help fix such issue since I don't know much other than packaging, though Pantheon runs fine for me so far. Some random thing I can think of so far

  1. Did systemctl status bamfdaemon.service --user failed?
  2. Any Pantheon stuff in coredumpctl? Then follow https://discourse.nixos.org/t/how-to-investigate-gnome-crashing/19726/2 to get a backtrace
    • For projects written in Vala, you will need overriding the package with VALAFLAGS = "-g"; to get correct line numbers.
  3. If you see criticals, to trigger a core dump when criticals appear you can set G_DEBUG=fatal-criticals, then go back to step 2

And after getting a backtrace with debug symbols, and if you see lines pointing to mutter's source files, you can probably check which commit later touches those lines and try to backport it to the mutter Pantheon uses and see how it goes

@OPNA2608
Copy link
Contributor Author

After re-enabling desktop notifications and re-discovering that TB mail notifs can crash my session, I've re-membered this issue. 😅

I don't have the time to dive into this too deeply, but I'll throw in what I've found. When I'm updating to 24.11, I'll try to find the time for more.

  1. Did systemctl status bamfdaemon.service --user failed?

Not AFAICT, at least there's no message about the service explicitly failing in the syslog.

  1. Any Pantheon stuff in coredumpctl?

coredumpctl doesn't like how I manage my journal and is eternally-empty, but here's the core dump info:

Core was generated by `/nix/store/r8vybm779j16akdanp3hq6lmamfr4i7p-gala-7.1.3/bin/gala'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ff0cb0bd927 in meta_window_actor_get_meta_window (self=0x0) at ../src/compositor/meta-window-actor.c:521

warning: 521	../src/compositor/meta-window-actor.c: No such file or directory
[Current thread is 1 (Thread 0x7ff0c8a38000 (LWP 3464))]
(gdb) bt
#0  0x00007ff0cb0bd927 in meta_window_actor_get_meta_window (self=0x0) at ../src/compositor/meta-window-actor.c:521
#1  0x00000000004316d7 in gala_notification_stack_update_positions ()
#2  0x0000000000431b93 in gala_notification_stack_show_notification ()
#3  0x00007ff0cc0a7668 in g_closure_invoke (closure=0xe4844a0, return_value=0x0, n_param_values=2, param_values=0x7fff77e02bf0, invocation_hint=0x7fff77e02b40) at ../gobject/gclosure.c:834
#4  0x00007ff0cc0bbbcc in signal_emit_unlocked_R (node=node@entry=0x7fff77e02cc0, detail=detail@entry=0, instance=instance@entry=0xd3e2fd0, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fff77e02bf0) at ../gobject/gsignal.c:3888
#5  0x00007ff0cc0bd561 in signal_emit_valist_unlocked (instance=instance@entry=0xd3e2fd0, signal_id=signal_id@entry=351, detail=detail@entry=0, var_args=var_args@entry=0x7fff77e02e20) at ../gobject/gsignal.c:3520
#6  0x00007ff0cc0c32c2 in g_signal_emit_valist (instance=0xd3e2fd0, signal_id=351, detail=0, var_args=0x7fff77e02e20) at ../gobject/gsignal.c:3263
#7  0x00007ff0cc0c336f in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3583
#8  0x00007ff0cc0a7668 in g_closure_invoke (closure=0xd3a9b80, return_value=0x0, n_param_values=2, param_values=0x7fff77e030d0, invocation_hint=0x7fff77e03020) at ../gobject/gclosure.c:834
#9  0x00007ff0cc0bbbcc in signal_emit_unlocked_R (node=node@entry=0x7fff77e031a0, detail=detail@entry=0, instance=instance@entry=0xd242520, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7fff77e030d0) at ../gobject/gsignal.c:3888
#10 0x00007ff0cc0bd561 in signal_emit_valist_unlocked (instance=instance@entry=0xd242520, signal_id=signal_id@entry=147, detail=detail@entry=0, var_args=var_args@entry=0x7fff77e03300) at ../gobject/gsignal.c:3520
#11 0x00007ff0cc0c32c2 in g_signal_emit_valist (instance=0xd242520, signal_id=147, detail=0, var_args=0x7fff77e03300) at ../gobject/gsignal.c:3263
#12 0x00007ff0cc0c336f in g_signal_emit (instance=instance@entry=0xd242520, signal_id=<optimized out>, detail=detail@entry=0) at ../gobject/gsignal.c:3583
#13 0x00007ff0cb0c8bf3 in meta_display_notify_window_created (display=display@entry=0xd242520, window=window@entry=0x105d31c0) at ../src/core/display.c:1740
#14 0x00007ff0cb0f3033 in _meta_window_shared_new (display=display@entry=0xd242520, client_type=client_type@entry=META_WINDOW_CLIENT_TYPE_X11, surface=surface@entry=0x0, xwindow=xwindow@entry=50338264, existing_wm_state=existing_wm_state@entry=0, 
    effect=effect@entry=META_COMP_EFFECT_CREATE, attrs=<optimized out>) at ../src/core/window.c:1391
#15 0x00007ff0cb12d754 in meta_window_x11_new (display=display@entry=0xd242520, xwindow=50338264, must_be_viewable=must_be_viewable@entry=0, effect=effect@entry=META_COMP_EFFECT_CREATE) at ../src/x11/window-x11.c:3747
#16 0x00007ff0cb1157e8 in handle_other_xevent (x11_display=x11_display@entry=0xd27fb70, event=event@entry=0x7fff77e03920) at ../src/x11/events.c:1523
#17 0x00007ff0cb11656b in meta_x11_display_handle_xevent (event=0x7fff77e03920, x11_display=0xd27fb70) at ../src/x11/events.c:1984
#18 xevent_filter (xevent=0x7fff77e03920, event=<optimized out>, data=0xd27fb70) at ../src/x11/events.c:2031
#19 0x00007ff0cbd56edf in gdk_event_apply_filters (xevent=xevent@entry=0x7fff77e03920, event=event@entry=0xe647180, window=window@entry=0x0) at ../gdk/x11/gdkeventsource.c:79
#20 0x00007ff0cbd571da in gdk_event_source_translate_event (xevent=0x7fff77e03920, event_source=0xd256980) at ../gdk/x11/gdkeventsource.c:198
#21 _gdk_x11_display_queue_events (display=0xd1acd90) at ../gdk/x11/gdkeventsource.c:341
#22 0x00007ff0cbcf9350 in gdk_display_get_event (display=display@entry=0xd1acd90) at ../gdk/gdkdisplay.c:442
#23 0x00007ff0cbd56f72 in gdk_event_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../gdk/x11/gdkeventsource.c:363
#24 0x00007ff0cc14ff54 in g_main_dispatch (context=context@entry=0xcdac680) at ../glib/gmain.c:3344
#25 0x00007ff0cc152fd7 in g_main_context_dispatch_unlocked (context=0xcdac680) at ../glib/gmain.c:4152
#26 g_main_context_iterate_unlocked (context=0xcdac680, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4217
#27 0x00007ff0cc15388f in g_main_loop_run (loop=0xd3d7bd0) at ../glib/gmain.c:4419
#28 0x00007ff0cb0d87b5 in meta_context_run_main_loop (context=<optimized out>, error=0x7fff77e03b98) at ../src/core/meta-context.c:465
#29 0x000000000042fc57 in gala_main ()
#30 0x00007ff0caa6410e in __libc_start_call_main (main=main@entry=0x421980 <main>, argc=argc@entry=1, argv=argv@entry=0x7fff77e03fd8) at ../sysdeps/nptl/libc_start_call_main.h:58
#31 0x00007ff0caa641c9 in __libc_start_main_impl (main=0x421980 <main>, argc=1, argv=0x7fff77e03fd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff77e03fc8) at ../csu/libc-start.c:360
#32 0x00000000004219b5 in _start ()

Looks to me like it's similar-enough to the stack trace from elementary/gala#1727 (comment), although outside of the multi-task view (I also remember the issue mentioned with that though, so maybe the cause is close enough).

Fix for that issue is elementary/gala@186e9a3 in 8.0.0, but our 7.x version builds fine with it applied. If I notice this again in 24.11, I'll test if applying this changes anything.

@OPNA2608
Copy link
Contributor Author

OPNA2608 commented Dec 2, 2024

New random crash after switching to 24.11, while trying to open an app from the starter. Throwing it in here for now.

↪ gdb /nix/store/5ng0gjhnzrz2l3ahci3d4rpf8jmlqh9m-gala-7.1.3/bin/.gala-wrapped /dev/shm/'core.\x2egala-wrapped.1000.005643e6c0554c53afb631428ccef1e6.3336.1733168187000000'
[...]
Core was generated by `/nix/store/5ng0gjhnzrz2l3ahci3d4rpf8jmlqh9m-gala-7.1.3/bin/gala'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f33250f380c in meta_window_focus (window=0x7504d50, timestamp=0) at ../src/core/window.c:4462

(gdb) bt
#0  0x00007f33250f380c in meta_window_focus (window=0x7504d50, timestamp=0) at ../src/core/window.c:4462
#1  0x00007f32fc03371b in wingpanel_interface_focus_manager_restore_focused_window () from /run/current-system/sw/lib/gala/plugins/libwingpanel-interface.so
#2  0x00007f32fc031906 in wingpanel_interface_dbus_server_restore_focused_window () from /run/current-system/sw/lib/gala/plugins/libwingpanel-interface.so
#3  0x00007f32fc031994 in _dbus_wingpanel_interface_dbus_server_restore_focused_window () from /run/current-system/sw/lib/gala/plugins/libwingpanel-interface.so
#4  0x00007f3325f20b78 in call_in_idle_cb (user_data=user_data@entry=0x7f330000f600) at ../gio/gdbusconnection.c:5458
#5  0x00007f33261c6c1e in g_idle_dispatch (source=0x7f3300010bb0, callback=0x7f3325f20a60 <call_in_idle_cb>, user_data=0x7f330000f600) at ../glib/gmain.c:6243
#6  0x00007f33261c9571 in g_main_dispatch (context=0x5b2fba0) at ../glib/gmain.c:3357
#7  g_main_context_dispatch_unlocked (context=context@entry=0x5b2fba0) at ../glib/gmain.c:4208
#8  0x00007f33261cb6b0 in g_main_context_iterate_unlocked (context=0x5b2fba0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4273
#9  0x00007f33261cc0df in g_main_loop_run (loop=0x63d5ed0) at ../glib/gmain.c:4475
#10 0x00007f33250de1f5 in meta_context_run_main_loop (context=<optimized out>, error=0x7ffc27c0fe08) at ../src/core/meta-context.c:465
#11 0x00000000004306e2 in gala_main ()
#12 0x00007f3324a4027e in __libc_start_call_main (main=main@entry=0x421980 <main>, argc=argc@entry=1, argv=argv@entry=0x7ffc27c10248) at ../sysdeps/nptl/libc_start_call_main.h:58
#13 0x00007f3324a40339 in __libc_start_main_impl (main=0x421980 <main>, argc=1, argv=0x7ffc27c10248, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc27c10238) at ../csu/libc-start.c:360
#14 0x00000000004219b5 in _start ()

@OPNA2608
Copy link
Contributor Author

If I notice this again in 24.11, I'll test if applying this changes anything.

Same issue with 24.11 just now, will apply the patch locally and see if that helps.

@OPNA2608
Copy link
Contributor Author

OPNA2608 commented Jan 4, 2025

With the following commits applied to pantheon.gala and notifications enabled, I haven't encountered a notifications-related crash (or any crash of Pantheon in general, AFAICR) in almost a month:

The latter required manual backporting, though maybe this could be worked around by also applying some commit in-between to fix what the failed hunk(s) are looking for.

@OPNA2608
Copy link
Contributor Author

With Gala 8.1 landed via #312449, I would expect this to be done on unstable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: pantheon The Pantheon desktop environment
Projects
None yet
Development

No branches or pull requests

3 participants