Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move luv_work_cleanup into loop gc #742

Merged
merged 2 commits into from
Dec 21, 2024
Merged

Conversation

truemedian
Copy link
Member

@truemedian truemedian commented Dec 19, 2024

atexit may be called after luv is unloaded and will segfault.

Associated #712

@truemedian truemedian changed the title move luv_work_cleanup into loop gcatexit may be move luv_work_cleanup into loop gc Dec 19, 2024
@squeek502
Copy link
Member

Are the submodule changes intentional?

@truemedian
Copy link
Member Author

No they most certainly are not, I thought I fixed that

atexit may be called after luv is unloaded and will segfault.
Copy link
Member

@zhaozg zhaozg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhaozg zhaozg merged commit 1120c33 into luvit:master Dec 21, 2024
14 checks passed
@clason
Copy link

clason commented Jan 24, 2025

Next necro-post (sorry!) This seems to have introduced an ASAN failure in Neovim tests:
https://github.com/neovim/neovim/actions/runs/12945549440/job/36109345313?pr=32174

Relevant log output:

ERR 2025-01-24T08:32:30.899 T3392.18644.0 loop_close:170: uv_loop_close() hang?
[--I] signal   0x55b8e47b6148
[-AI] async    0x55b8e47b5f90
[RA-] timer    0x50e000000200

(Not very useful; will need to spend more time tracking this down.)

@luukvbaal
Copy link

luukvbaal commented Jan 24, 2025

(Not very useful; will need to spend more time tracking this down.)

The relevant ASAN failures in that log are e.g. here: https://github.com/neovim/neovim/actions/runs/12945549440/job/36109345313?pr=32174#step:10:4618. (Search for sanitizer)

@truemedian
Copy link
Member Author

I'm not really sure I understand the test-flow of neovim, but there being many different ASAN logs from what appears to be the same test file, is neovim closing the primary Lua state and reinitializing from scratch between each test?

If this is actually a bug in Luv and not neovim just not calling lua_close on the primary thread (which I think is the only way for this bug to manifest if it wasn't before), then it's worth making this an issue so it can be properly tracked.

@clason
Copy link

clason commented Jan 25, 2025

Yes, the test setup is a bit involved; basically there are two Neovim instances: The one handling the tests (think of this as a lua interpreter with a custom stdlib) and the one that is executing the actual test case ("system under test") which is being started and observed by the test handler; the system under test is where the logs are coming from, and there's indeed a fresh instance for every test (hence many logs).

This is definitely a regression in v1.50.0; if there's a breaking change where something needs to be changed on Neovim's side, I would appreciate more detailed pointers.

@truemedian
Copy link
Member Author

truemedian commented Jan 25, 2025

The only thing this change does is move the work vm cleanup from an atexit hook (which causes problems in environments where atexit is called after unloading dynamic libraries) to the __gc hook for the luv loop (and the only way to release the luv loop is to close the lua state).

Therefore if previously neovim was relying on the atexit hook to clean up the work vms and not calling lua_close, then it needs to call lua_close on the primary state now.

If that doesn't solve the problem then this or something else introduced an actual problem and a real issue should be opened.

@lewis6991
Copy link

Neovim is calling lua_close here inside nlua_common_free_all_mem which is:

  • (for worker threads) set as the thread callback (via luv_set_thread_cb) here
  • (for the primary thread) called in nlua_free_all_mem here. This is called whenever Nvim exits and is required for the sanitizers to work.

So from what I can tell, Nvim is always calling lua_close where it needs to.

@lewis6991
Copy link

lewis6991 commented Jan 29, 2025

Debugged this a little further. I believe the issue is due to Nvim calling luv_set_loop() (to set our own loop object) right before luaopen_luv() and loop_gc is only set on loops created by Luv. Unfortunetly there's nothing Nvim can do to workaround this since:

  /* do cleanup in main thread */
  lua_getglobal(L, "_THREAD");
  if (lua_isnil(L, -1))
    luv_work_cleanup();
  lua_pop(L, 1);

isn't accessible.

Either the above snippet needs to be wrapped into a function and exposed as API, or luv_work_cleanup() needs to be exposed as API.

@Bilal2453
Copy link
Contributor

Can you open an issue for discussing this?

@lewis6991
Copy link

👍 I've opened #754

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants