-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[coq] Support for token-based interruption #509
Conversation
Actually I'm unsure that could work, indeed semantics of async effects are quite a bit more complex than semantics for regular ones, IIUC (c.f. https://arxiv.org/abs/2003.02110 ) . @gadmm , is it feasible to think that async interruptions here (for token or limits) could be handled in OCaml 5 as an effect that could be resumed? |
I think this needs more testing, so I will postpone to 0.1.8 for now, as I'd like to release 0.1.7 ASAP. |
If the question is about asynchronously pausing/restarting instead of immediately cancelling, you might be overthinking it as you can probably do this with (sys)threads using condition variables from an async callback (whether with signals & signal masks, or with an approach à la memprof-limits). |
Yes, that's the question. Usually we have a very slow With Let's assume for simplicity that all the computations here are pure (or properly guarded).
Would you minding explaining a bit more how that would work in the above case? I guess what you suggest is to have the async handler determine whether the computation should be interrupted or resumed, then act accordingly ? But I'm unsure what the limitations there would be, for example if the hanlder needs to do IO before resuming. I was thinking of an interface more similar to effects, where the handler gets the continuation, which can be resumed or not at any later arbitrary point. |
Using condition variables, you let the OS deal with scheduling. In this scenario there is no "async handler" and no need for one. There is only an async callback, that is called inside the thread to be paused/interrupted. Inside this async callback, it is possible to wait on a condition variable as with control_check below: type thread_status = Running | Paused | Interrupted
let status = { mutable status : thread_status; mutex : Mutex.create (); cond : Condition.create () }
(* from thread 1 *)
let control_set new_status =
Mutex.lock status.mutex;
status.status <- new_status;
if new_status <> Paused then Condition.broadcast status.cond;
Mutex.unlock status.mutex
(* from an async callback in thread 2 *)
let control_check () =
Mutex.lock status.mutex;
while status.status = Paused do
Condition.wait status.cond status.mutex
done;
let interrupted = (status.status = Interrupted) in
Mutex.unlock status.mutex;
if interrupted then raise Sys.Break Note that in your current approach, this involve changing memprof-limits itself to do something like that. |
To really answer your question though:
Currently, it is not possible to perform an effect inside an async callback and handle it outside of the callback. There is no fundamental obstacle to implementing this, but: 1) this requires some work on the runtime to radically change how async callbacks run (essentially remove the need to switch from OCaml to C when running async actions), 2) you would need to convince upstream that this is something which they would like to have. |
Hey sorry, recently realized that I wasn't getting notified on github mentions. I'll test this right away, this seems super useful. |
Hi @HazardousPeach , we actually did a lot of progress with this implementation; I'll push a new tree soon, after discussion with Guillaume (the author of the lib we use here) we'd like to do some significant changes. I'll be happy to merge this upstream once it gets a bit of testing. Note that for your use case, we have finally worked (after your feedback and others) a protocol that should work much much better than LSP, but it will use the coq-lsp engine underneath. |
This comment was marked as outdated.
This comment was marked as outdated.
…ken. - We now conditionally depend on `memprof-limits` on OCaml 4.x - Each call that can be interrupted now additionally takes a `Token.t` There is still more work to be done so this can be properly used. cc: #509
55db79f
to
0925408
Compare
- We now conditionally depend on `memprof-limits` on OCaml 4.x - Each call that can be interrupted now additionally can take a `Token.t` There is still more work to be done so this can be properly used. cc: #509
- We now conditionally depend on `memprof-limits` on OCaml 4.x - Each call that can be interrupted now additionally can take a `Token.t` There is still more work to be done so this can be properly used. cc: #509
For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.
For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.
For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.
Huh, so once I switched back to the 4.x OCaml compiler, I started getting the Prelude Not Found error again. To be sure, I created an opam switch from scratch with the following script, from inside the
but the error persists, even though the same commands worked fine on OCaml 5.x. Any ideas? |
@HazardousPeach is your opam has weird rules, so maybe running [Indeed opam has to correctly pick the updated opam file from |
Also check that the |
My This failed the first time around because something wasn't right with how opam scheduled the package installs on the last command; the So I just ran
This seems to be because when From there I ran After all that though, interrupts still aren't working, with the same behavior as when I had OCaml 5.x. My current OCaml is 4.14.2, could that be too new for some reason? Or could something else be going on? For reference, memprof-limits is at 0.2.1 |
Oh indeed sorry, the problem with memprof-limits is annoying, I will update the instructions. Indeed you need the latest code. Thanks a lot for the testing, I guess the next step is for me to try to reproduce the issue, would it be possible to share a problematic example? Thanks again! |
@HazardousPeach note that examples in #139 #484 #487 work correctly with the new interruption backend, so maybe these are good tests on your side. The fix for Coq that I was waiting has been merged, so I will complete the docs, tweak display and usability and close this PR soon. |
Hmm, so the example from #487 doesn't interrupt properly on my end, so maybe there's a problem with how my coq-lsp is set up. Could you post the OCaml and memprof-limits versions that you've tested with? Could also be something about the way I'm driving coq-lsp in python versus through VSCode.
And then running this script: That clones CompCert, builds it, starts a coq_serapy agent instance with coq-lsp as the backend, and then runs the commands from #487, checking the goals in between each. When it runs the final simpl, after a 30 second timeout, it rolls back the simple and sends a a new document without the simpl, and asks for goals again. On my side, that hangs, and it keeps rolling back a new sentence every 30 seconds. |
@HazardousPeach , I have no problem interrupting the example from regular coq-lsp, so I cannot reproduce. Will check serapy , you can check with VSCode if things work out well in your end (don't forget to pass the parameter) |
So indeed things don't work in SerAPY, trying to understand a bit more, however the way the code is structured now makes it pretty hard to debug (why is the code checking trace messages? Also, the way requests are handled seems strange) |
Hi @HazardousPeach , I did some more debug, and indeed, I cannot reproduce with the standard node LSP client. You can find the example here: #626 If you run the recommended command, the sequence serapy is trying works as expected. I wouldn't be surprised if this is down to some low-level IO Python vs node differences, so IMHO the next step to debug is to be able to display what serapy is sending in the wire. I will stop this debug session for now, as I'm unsure how to best update the serapy code for this tracking. |
I did some further checks, and actually the server is working fine; it seems that for some reason serapy gets deadlocked. The trace for `test.py is:
and continues like this, with coq-lsp mostly idle between each A wild guess is that serapy doesn't correctly recognized that the server cancelled request id 19 here, as the checking was interrupted before the position on the document was reached to serve the goal request. Anyways after having a look at the code, I cannot recommend you folks implement a RL agent on top of LSP, it is just not a good fit; I'd look into using |
Ah, huh. When I did my own tracing I didn't see those I've got some other tasks I have to catch up on, but I should be able to get back to this later in the week. When I do, I'll re-implement the tracing on my end and see if the behavior still differs, and I'll look into using However, the RL is already implemented fully on top of Relevant to this, interruption in |
So far I've done all my tests on I had to hack the server to realize it was working fine, tho you should see that in the test case you posted, CPU usage is 0%, however, when you pass
Yes, and it will be hugely more efficient. It's been on my mind to fix the problems you folks had on SerAPI for a while, but it took a long time to get the pre-requisites ready. Now the main todo is multi-threading, I have a patch that could kind of work, however memprof-limits is not available in OCaml 5 :( However I think for RL-like applications, multi-threading in the server is a lower priority than for UI development.
That would be amazing indeed, however I didn't manage to get it working, must be some silly thing. I wonder if @thierry-martinez had a chance to look into these problems.
Indeed it is hard for me to asses what would make sense in your case, but I think the problems w.r.t. request will be similar both for petanque and LSP. We have just considered using JSON-RPC as the official transport for petanque as a server, so in this case, it would share the request base with LSP. IMHO that's a very good thing. In general, you need to handle all RPC calls as an async request, and check the error cases properly. JSON-RPC provides reasonable settings for this and seems popular enough.
I think the issue in your backend was not due to interruptions per-se but due to the cancellation of an async request. Note that this can happen in many cases, not only in interruptions (for example the server can inspect its queue and decide it got enough work to do). So yes, the issue is very likely to occur in petanque. Both |
Note that Guillaume is coding a python backend for petanque, so maybe that could be plugged into your infra easily. |
7d7f30c
to
ab56289
Compare
…s on a sane branch. This will prevent frustation from users.
Last bit of support has been added. |
This very preliminary PR reifies the Coq API so we can pass an interruption / cancellation token around.
Two implementations of the API are provided:
Control.interrupt
variable present in Coqmemprof-limits
by @gadmmThe code is very preliminary and will likely be rewritten to better accomodate the API, however the current version works, and testing is much appreciated! This is a delicate part of the system so testing is badly needed.
In the examples we had,
memprof-limits
works superb! In particular it works in all the examples where polling fails :DI think this should allow us to get rid of Coq polling eventually.
Some open issues:
Qed
) is interrupted (for example the user does a search), we need to restart it from scratch. That's a big PITA. We could solve this in OCaml 5 both with Domains or algebraic effects + continuations, however....memprof-limits
is still not ported to OCaml 5, thus Support for interrupting another domain ocaml/ocaml#11411 is still open.Also, it didn't take long for us to find a bug in Coq and how they handle resources, c.f. coq/coq#17760
Thanks to @gadmm for all the interest and help.
Also cc: @HazardousPeach
Bugs fixed when using the
memprof-limits
backend: #139 #484 #487