Skip to content

Conversation

daurnimator
Copy link
Collaborator

@daurnimator daurnimator commented Sep 6, 2016

co = coroutine.create(function() end)
coroutine.resume(co)
cq = require"cqueues".new()
cq:attach(co)
cq:step()
lua: /home/daurnimator/src/cqueues/src/cqueues.c:2017: cqueue_resume: Assertion `nargs >= 0' failed.
Aborted (core dumped)

  - Instead of re-pushing all arguments, just use lua_insert to get thread below args
  - Need to check the new stack to ensure we have enough slots
src/cqueues.c Outdated
} else {
nargs = lua_gettop(T->L);
if (status != LUA_YIELD) {
if (status == LUA_OK && lua_getstack(T->L, 0, &(lua_Debug){}) > 0) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little bit was adapted from the coroutine.status implementation: http://www.lua.org/source/5.3/lcorolib.c.html#luaB_costatus

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test should be status == LUA_OK && lua_getstack(...) <= 0, or even better status == LUA_OK && lua_getstack(...) != 1 IIUC, nargs is decremented because in the initial state we want to exclude the function that will be called from the number of arguments provided to lua_resume.

Alternatively, just remove the assertion and don't decrement nargs if it's 0. If we've been given a dead coroutine or a running coroutine then lua_resume() will return a descriptive error, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test should be status == LUA_OK && lua_getstack(...) <= 0, or even better status == LUA_OK && lua_getstack(...) != 1

Why? does >0 vs <= 0 matter?
==> What I have is the same as what the lua coroutine library has.

This branch is what to do if the coroutine is already running; e.g.:

local cq = cqueues.new()
local co = coroutine.create(function()
    print(cq:step())
end)
cq:attach(co)
coroutine.resume(co)

If we've been given a dead coroutine or a running coroutine then lua_resume() will return a descriptive error, right?

A running coroutine, yes.
However, a dead coroutine is not handled well. Note that lua itself checks for this: http://www.lua.org/source/5.3/lcorolib.c.html#auxresume

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? does >0 vs <= 0 matter?
==> What I have is the same as what the lua coroutine library has.

The original code was assuming that if status != LUA_YIELD then it must be a newly created thread. That's why it was decrementing nargs--to exclude the function at the bottom of the stack (on the first invocation, lua_resume is like lua_pcall). lua_getstack returns 1 if the thread is running. We want to know whether the thread is in the initial state. If you look at the code in lcorolib.c, the non-yielding, suspended condition is only reached if status == LUA_OK && lua_getstack() <= 0. But using a condition of != 1 better matches the official documentation for lua_getstack.

@daurnimator
Copy link
Collaborator Author

Changed.

@wahern
Copy link
Owner

wahern commented Sep 9, 2016

The issue is triggering the assert. The simplest solution is to not decrement nargs below 0. For everything else, lua_resume will report the error for us. Any reason why we're literally copying and reimplementing the code from ldo.c:resume? lua_resume will push the exact same error message onto the stack, and AFAICT we'll create the exact same error context (see the default: case for the switch on the status returned from lua_resume) afterward.

If it's intended for improved diagnostic messages, shouldn't we put the check into cqueue_wrap so that it returns an error immediately? That way one would know what code was attaching the bogus coroutine.

@daurnimator
Copy link
Collaborator Author

The issue is triggering the assert. The simplest solution is to not decrement nargs below 0. For everything else, lua_resume will report the error for us. Any reason why we're literally copying and reimplementing the code from ldo.c:resume? lua_resume will push the exact same error message onto the stack, and AFAICT we'll create the exact same error context (see the default: case for the switch on the status returned from lua_resume) afterward.

It just felt really "icky" doing that.... if (nargs > 0) nargs--; it seems like so much could go wrong.

If it's intended for improved diagnostic messages, shouldn't we put the check into cqueue_wrap so that it returns an error immediately? That way one would know what code was attaching the bogus coroutine.

That wouldn't catch all cases: someone could :attach a coroutine, then resume it outside of cqueues, then :step(). Therefore I don't think its worth adding a check there.

wahern added a commit that referenced this pull request Oct 17, 2016
wahern added a commit that referenced this pull request Oct 17, 2016
@wahern
Copy link
Owner

wahern commented Oct 19, 2016

Merged a different approach which removed the assertion. I was uncomfortable duplicating so much internal logic from the Lua VM.

But leaving open because it's still unresolved whether it's worthwhile to specifically detect and report a dead coroutine (i.e. one that has successfully terminated) for the diagnostic/debugging benefit.

Note 1: Cases other than a dead coroutine should be reported by lua_resume with the same messages as this pull request.
Note 2: The next release of Lua 5.3 will include a lua_resume which does dead coroutine detection, making the behavior more symmetric with coroutine.resume. Currently calling lua_resume on a dead coroutine generates an error message about calling a nil value, which is admittedly quite unhelpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants