[coq] Support for token-based interruption #509

ejgallego · 2023-06-21T19:26:14Z

This very preliminary PR reifies the Coq API so we can pass an interruption / cancellation token around.

Two implementations of the API are provided:

one using the "classical" Control.interrupt variable present in Coq
one using memprof-limits by @gadmm

The code is very preliminary and will likely be rewritten to better accomodate the API, however the current version works, and testing is much appreciated! This is a delicate part of the system so testing is badly needed.

In the examples we had, memprof-limits works superb! In particular it works in all the examples where polling fails :D

I think this should allow us to get rid of Coq polling eventually.

Some open issues:

While token-based interruption works pretty well, there are a few corner cases where the user experience is not optimal. In particular, when a command that takes a lot of time to run (like a Qed) is interrupted (for example the user does a search), we need to restart it from scratch. That's a big PITA. We could solve this in OCaml 5 both with Domains or algebraic effects + continuations, however....
... unfortunately memprof-limits is still not ported to OCaml 5, thus Support for interrupting another domain ocaml/ocaml#11411 is still open.

Also, it didn't take long for us to find a bug in Coq and how they handle resources, c.f. coq/coq#17760

Thanks to @gadmm for all the interest and help.

Also cc: @HazardousPeach

Bugs fixed when using the memprof-limits backend: #139 #484 #487

ejgallego · 2023-06-23T08:59:50Z

We could solve this in OCaml 5 both with Domains or algebraic effects + continuations, however....

Actually I'm unsure that could work, indeed semantics of async effects are quite a bit more complex than semantics for regular ones, IIUC (c.f. https://arxiv.org/abs/2003.02110 ) .

@gadmm , is it feasible to think that async interruptions here (for token or limits) could be handled in OCaml 5 as an effect that could be resumed?

ejgallego · 2023-06-23T10:48:25Z

I think this needs more testing, so I will postpone to 0.1.8 for now, as I'd like to release 0.1.7 ASAP.

gadmm · 2023-06-23T11:55:45Z

If the question is about asynchronously pausing/restarting instead of immediately cancelling, you might be overthinking it as you can probably do this with (sys)threads using condition variables from an async callback (whether with signals & signal masks, or with an approach à la memprof-limits).

ejgallego · 2023-06-23T12:56:13Z

If the question is about asynchronously pausing/restarting instead of immediately cancelling

Yes, that's the question.

Usually we have a very slow Qed, which is busy with some large terms in conversion (so no polling available)

With memprof-limits I can now interrupt the slow Qed, which is good, however after the interruption, the manager may decide that it'd like to resume the Qed computation.

Let's assume for simplicity that all the computations here are pure (or properly guarded).

you might be overthinking it as you can probably do this with (sys)threads using condition variables from an async callback (whether with signals & signal masks, or with an approach à la memprof-limits).

Would you minding explaining a bit more how that would work in the above case?

I guess what you suggest is to have the async handler determine whether the computation should be interrupted or resumed, then act accordingly ? But I'm unsure what the limitations there would be, for example if the hanlder needs to do IO before resuming.

I was thinking of an interface more similar to effects, where the handler gets the continuation, which can be resumed or not at any later arbitrary point.

gadmm · 2023-06-23T13:39:39Z

Using condition variables, you let the OS deal with scheduling. In this scenario there is no "async handler" and no need for one.

There is only an async callback, that is called inside the thread to be paused/interrupted. Inside this async callback, it is possible to wait on a condition variable as with control_check below:

type thread_status = Running | Paused | Interrupted

let status = { mutable status : thread_status; mutex : Mutex.create (); cond : Condition.create () }

(* from thread 1 *)
let control_set new_status =
  Mutex.lock status.mutex;
  status.status <- new_status;
  if new_status <> Paused then Condition.broadcast status.cond;
  Mutex.unlock status.mutex

(* from an async callback in thread 2 *)
let control_check () =
  Mutex.lock status.mutex;
  while status.status = Paused do
    Condition.wait status.cond status.mutex
  done;
  let interrupted = (status.status = Interrupted) in
  Mutex.unlock status.mutex;
  if interrupted then raise Sys.Break

Note that in your current approach, this involve changing memprof-limits itself to do something like that.

gadmm · 2023-06-23T13:47:46Z

To really answer your question though:

is it feasible to think that async interruptions here (for token or limits) could be handled in OCaml 5 as an effect that could be resumed?

Currently, it is not possible to perform an effect inside an async callback and handle it outside of the callback. There is no fundamental obstacle to implementing this, but: 1) this requires some work on the runtime to radically change how async callbacks run (essentially remove the need to switch from OCaml to C when running async actions), 2) you would need to convince upstream that this is something which they would like to have.

HazardousPeach · 2024-03-04T18:40:08Z

Hey sorry, recently realized that I wasn't getting notified on github mentions. I'll test this right away, this seems super useful.

ejgallego · 2024-03-05T00:17:13Z

Hi @HazardousPeach , we actually did a lot of progress with this implementation; I'll push a new tree soon, after discussion with Guillaume (the author of the lib we use here) we'd like to do some significant changes.

I'll be happy to merge this upstream once it gets a bit of testing.

Note that for your use case, we have finally worked (after your feedback and others) a protocol that should work much much better than LSP, but it will use the coq-lsp engine underneath.

…ken. - We now conditionally depend on `memprof-limits` on OCaml 4.x - Each call that can be interrupted now additionally takes a `Token.t` There is still more work to be done so this can be properly used. cc: #509

- We now conditionally depend on `memprof-limits` on OCaml 4.x - Each call that can be interrupted now additionally can take a `Token.t` There is still more work to be done so this can be properly used. cc: #509

For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.

HazardousPeach · 2024-05-06T14:30:12Z

Huh, so once I switched back to the 4.x OCaml compiler, I started getting the Prelude Not Found error again. To be sure, I created an opam switch from scratch with the following script, from inside the coq-lsp directory:

#!/usr/bin/env bash
set -e
opam switch create coq-lsp ocaml-base-compiler.4.14.2
eval $(opam env --switch=coq-lsp --set-switch)
opam install vendor/coq/coq{-core,-stdlib,ide-server,}.opam -y
opam install vendor/coq-serapi -y
opam install . -y

but the error persists, even though the same commands worked fine on OCaml 5.x.

Any ideas?

ejgallego · 2024-05-06T14:40:05Z

@HazardousPeach is your vendor/coq repository clean?

opam has weird rules, so maybe running git clean -fxd in vendor/coq could help. It could also be that your opam version is old and has some pinning bugs / lack of features?

[Indeed opam has to correctly pick the updated opam file from vendor and only moderm opam do]

ejgallego · 2024-05-06T14:41:26Z

Also check that the coq-lsp.opam file contains the rm -r vendor which is critical.

HazardousPeach · 2024-05-06T21:03:41Z

My vendor/coq is clean. My Opam version is 2.1.2, is that recent enough? But I didn't see rm -r vendor in coq-lsp.opam, and realized that my code was a bit behind (I hadn't updated since you added that I guess), so I fetched the latest pull request code, deleted the switch, and recreated it with the same script.

This failed the first time around because something wasn't right with how opam scheduled the package installs on the last command; the opam install of vendor/coq-core.opam (among others) first compiled without mprof-limits (I think), so when the opam install . ran, it decided to recompile coq-core with mprof-limits. But in doing so it first removed the old version, then tried to build coq-lsp, then installed the new version of coq-core. So the coq-lsp build failed since there was no coq-core for it to import at build time.

So I just ran opam install . again, and got the error:

# Error: Files <...>/.opam/coq-lsp/lib/coq-serapi/serlib/serlib.cmxa
#        and <...>/.opam/coq-lsp/lib/coq-core/lib/lib.cmxa
#        make inconsistent assumptions over implementation CErrors

This seems to be because when coq-core was rebuilt, coq-serapi was not, even though coq-serapi links against it at an OCaml level. Maybe some issue with how the vendored coq-serapi specifies dependencies?

From there I ran opam reinstall vendor/coq-serapi, and then opam install . again, and coq-lsp successfully installed.

After all that though, interrupts still aren't working, with the same behavior as when I had OCaml 5.x.

My current OCaml is 4.14.2, could that be too new for some reason? Or could something else be going on?

For reference, memprof-limits is at 0.2.1

ejgallego · 2024-05-07T00:46:36Z

Oh indeed sorry, the problem with memprof-limits is annoying, I will update the instructions.

Indeed you need the latest code. Thanks a lot for the testing, I guess the next step is for me to try to reproduce the issue, would it be possible to share a problematic example?

Thanks again!

ejgallego · 2024-05-07T13:21:16Z

@HazardousPeach note that examples in #139 #484 #487 work correctly with the new interruption backend, so maybe these are good tests on your side.

The fix for Coq that I was waiting has been merged, so I will complete the docs, tweak display and usability and close this PR soon.

HazardousPeach · 2024-05-07T19:42:33Z

Hmm, so the example from #487 doesn't interrupt properly on my end, so maybe there's a problem with how my coq-lsp is set up. Could you post the OCaml and memprof-limits versions that you've tested with? Could also be something about the way I'm driving coq-lsp in python versus through VSCode.
To aid in reproduction, you should be able to reproduce exactly what I'm running by running

pip install git+https://github.com/HazardousPeach/coq_serapy.git

And then running this script:

https://pastebin.com/H5JW2HeR

That clones CompCert, builds it, starts a coq_serapy agent instance with coq-lsp as the backend, and then runs the commands from #487, checking the goals in between each. When it runs the final simpl, after a 30 second timeout, it rolls back the simple and sends a a new document without the simpl, and asks for goals again. On my side, that hangs, and it keeps rolling back a new sentence every 30 seconds.

ejgallego · 2024-05-10T10:52:03Z

@HazardousPeach , I have no problem interrupting the example from regular coq-lsp, so I cannot reproduce.

Will check serapy , you can check with VSCode if things work out well in your end (don't forget to pass the parameter)

ejgallego · 2024-05-10T11:21:38Z

So indeed things don't work in SerAPY, trying to understand a bit more, however the way the code is structured now makes it pretty hard to debug (why is the code checking trace messages? Also, the way requests are handled seems strange)

ejgallego · 2024-05-10T12:30:29Z

Hi @HazardousPeach , I did some more debug, and indeed, I cannot reproduce with the standard node LSP client.

You can find the example here: #626

If you run the recommended command, the sequence serapy is trying works as expected.

I wouldn't be surprised if this is down to some low-level IO Python vs node differences, so IMHO the next step to debug is to be able to display what serapy is sending in the wire.

I will stop this debug session for now, as I'm unsure how to best update the serapy code for this tracking.

ejgallego · 2024-05-10T13:27:22Z

I did some further checks, and actually the server is working fine; it seems that for some reason serapy gets deadlocked.

The trace for `test.py is:

...
=> "textDocument/didChange"  { [version: 20] "... unfold map. simpl."
=> "proof/goals" { id : 19 } { [version N/A] }
    [process_queue]: Serving Request: textDocument/didChange
    [process_queue]: Serving Request: proof/goals
    [request]: postponing rq : 19
<= "textDocument/publishDiagnostics" { [version: 20], partial=true }
=> "textDocument/didChange"  { [version: 21] "... unfold map."
    [process_queue]: Serving Request: textDocument/didChange
<= {"jsonrpc":"2.0","id":19,"error":{"code":-32802,"message":"Request got old in server"}}
    [process_queue]: Serving Request: proof/goals
    [request]: postponing rq : 20
<= "textDocument/publishDiagnostics" { [version: 21], partial=true }
<= "textDocument/publishDiagnostics" { [version: 21], partial=false }
    [serving]: rq: 20 | {k:pos | l: 17, c: 11 v:  p: true}
=> "proof/goals" { id : 20 } { answer... }

and continues like this, with coq-lsp mostly idle between each didChange notification.

A wild guess is that serapy doesn't correctly recognized that the server cancelled request id 19 here, as the checking was interrupted before the position on the document was reached to serve the goal request.

Anyways after having a look at the code, I cannot recommend you folks implement a RL agent on top of LSP, it is just not a good fit; I'd look into using petanque.

HazardousPeach · 2024-05-13T14:46:10Z

Ah, huh. When I did my own tracing I didn't see those proof/goals responses, maybe my setup has some difference causing an issue. If so, then I wouldn't have had a chance to experience the correct cancelling behavior, so my handling code would be untested and not ready for the correct behavior.

I've got some other tasks I have to catch up on, but I should be able to get back to this later in the week. When I do, I'll re-implement the tracing on my end and see if the behavior still differs, and I'll look into using petanque as a backend instead. It certainly seems like an easier api for what I'm doing, and the ocaml-in-python api would make it even more attractive (though the readme says it doesn't work yet).

However, the RL is already implemented fully on top of coq_serapy with the coq-serapi backend, the only issue is that backtracking is too expensive for smart search to work as well as it should in coq-serapi (but we're under submission anyway). The coq-lsp backend support was meant to fix this issue, and the support is complete except for interruption; the RL algorithm functions fully on proof searches where interruption is not necessary. So, I'll have to assess whether it makes sense to build a petanque backend right away, since the coq-lsp logic is already fully implemented (except for handling interrupt responses properly).

Relevant to this, interruption in petanque works the same as interruption in coq-lsp, right? So, if there's an issue with my setup causing problems interrupting coq-lsp, then that same issue would affect petanque?

ejgallego · 2024-05-13T15:36:41Z

Ah, huh. When I did my own tracing I didn't see those proof/goals responses, maybe my setup has some difference causing an issue. If so, then I wouldn't have had a chance to experience the correct cancelling behavior, so my handling code would be untested and not ready for the correct behavior.

So far I've done all my tests on main and worked fine, your backend also passed the parameters correctly.

I had to hack the server to realize it was working fine, tho you should see that in the test case you posted, CPU usage is 0%, however, when you pass --int_backend=Coq, it should go to 100% CPU use.

It certainly seems like an easier api for what I'm doing

Yes, and it will be hugely more efficient. It's been on my mind to fix the problems you folks had on SerAPI for a while, but it took a long time to get the pre-requisites ready.

Now the main todo is multi-threading, I have a patch that could kind of work, however memprof-limits is not available in OCaml 5 :( However I think for RL-like applications, multi-threading in the server is a lower priority than for UI development.

and the ocaml-in-python api would make it even more attractive (though the readme says it doesn't work yet).

That would be amazing indeed, however I didn't manage to get it working, must be some silly thing. I wonder if @thierry-martinez had a chance to look into these problems.

I'll have to assess whether it makes sense to build a petanque backend right away, since the coq-lsp logic is already fully implemented (except for handling interrupt responses properly).

Indeed it is hard for me to asses what would make sense in your case, but I think the problems w.r.t. request will be similar both for petanque and LSP.

We have just considered using JSON-RPC as the official transport for petanque as a server, so in this case, it would share the request base with LSP. IMHO that's a very good thing.

In general, you need to handle all RPC calls as an async request, and check the error cases properly. JSON-RPC provides reasonable settings for this and seems popular enough.

Relevant to this, interruption in petanque works the same as interruption in coq-lsp, right? So, if there's an issue with my setup causing problems interrupting coq-lsp, then that same issue would affect petanque?

I think the issue in your backend was not due to interruptions per-se but due to the cancellation of an async request. Note that this can happen in many cases, not only in interruptions (for example the server can inspect its queue and decide it got enough work to do).

So yes, the issue is very likely to occur in petanque.

Both coq-lsp and petanque are kind of flexible w.r.t. interruptions, for now, they are set to interrupt when a new command has been received, but could do otherwise, including having an expliciit Interrupt command.

ejgallego · 2024-05-13T15:37:19Z

Note that Guillaume is coding a python backend for petanque, so maybe that could be plugged into your infra easily.

…s on a sane branch. This will prevent frustation from users.

ejgallego · 2024-05-28T11:39:01Z

Last bit of support has been added.

ejgallego force-pushed the limits branch from da41c9b to a0ad497 Compare June 21, 2023 19:36

ejgallego added this to the 0.1.8 milestone Jun 23, 2023

ejgallego added the needs: testing label Jun 23, 2023

ejgallego modified the milestones: 0.1.8, 0.1.9 Jul 7, 2023

ejgallego modified the milestones: 0.1.9, 0.3.0 Oct 2, 2023

This comment was marked as outdated.

Sign in to view

ejgallego force-pushed the limits branch 2 times, most recently from 55db79f to 0925408 Compare April 2, 2024 15:39

ejgallego force-pushed the limits branch from 8108908 to 1e7ce1e Compare April 3, 2024 21:09

ejgallego force-pushed the limits branch from 1e7ce1e to 5731f11 Compare April 3, 2024 21:11

ejgallego added a commit that referenced this pull request Apr 3, 2024

[coq] Monitor Coq calls under a token interruption parameter.

94428e2

For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.

ejgallego mentioned this pull request Apr 3, 2024

[coq] Monitor Coq calls under a token interruption parameter. #661

Merged

ejgallego added a commit that referenced this pull request Apr 3, 2024

[coq] Monitor Coq calls under a token interruption parameter.

431d5ce

For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.

ejgallego added a commit that referenced this pull request Apr 3, 2024

[coq] Monitor Coq calls under a token interruption parameter.

579e919

For now the interruption token is still the global `Control.interrupt` variable, but we adapt the code to the token interface. This PR was part of #509.

ejgallego force-pushed the limits branch from 5731f11 to 066523d Compare April 3, 2024 21:56

ejgallego modified the milestones: 0.3.0, 0.1.9 Apr 3, 2024

ejgallego force-pushed the limits branch 2 times, most recently from 7d7f30c to ab56289 Compare May 28, 2024 11:37

ejgallego added kind: enhancement New feature or request part: lsp server kind: internal part: ocaml part: Coq API kind: upstream part: upstream and removed needs: testing labels May 28, 2024

ejgallego marked this pull request as ready for review May 28, 2024 11:38

[limits] Don't enable limits automatically on Coq 8.19 or lower unles…

35b26a3

…s on a sane branch. This will prevent frustation from users.

ejgallego force-pushed the limits branch from ab56289 to 35b26a3 Compare May 28, 2024 11:39

ejgallego merged commit ae6ec4d into main May 28, 2024
13 checks passed

ejgallego deleted the limits branch May 28, 2024 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[coq] Support for token-based interruption #509

[coq] Support for token-based interruption #509

ejgallego commented Jun 21, 2023 •

edited

Loading

ejgallego commented Jun 23, 2023

ejgallego commented Jun 23, 2023

gadmm commented Jun 23, 2023

ejgallego commented Jun 23, 2023

gadmm commented Jun 23, 2023 •

edited

Loading

gadmm commented Jun 23, 2023

HazardousPeach commented Mar 4, 2024

ejgallego commented Mar 5, 2024

This comment was marked as outdated.

HazardousPeach commented May 6, 2024

ejgallego commented May 6, 2024

ejgallego commented May 6, 2024

HazardousPeach commented May 6, 2024 •

edited

Loading

ejgallego commented May 7, 2024

ejgallego commented May 7, 2024

HazardousPeach commented May 7, 2024 •

edited

Loading

ejgallego commented May 10, 2024 •

edited

Loading

ejgallego commented May 10, 2024

ejgallego commented May 10, 2024 •

edited

Loading

ejgallego commented May 10, 2024 •

edited

Loading

HazardousPeach commented May 13, 2024

ejgallego commented May 13, 2024

ejgallego commented May 13, 2024

ejgallego commented May 28, 2024

[coq] Support for token-based interruption #509

[coq] Support for token-based interruption #509

Conversation

ejgallego commented Jun 21, 2023 • edited Loading

ejgallego commented Jun 23, 2023

ejgallego commented Jun 23, 2023

gadmm commented Jun 23, 2023

ejgallego commented Jun 23, 2023

gadmm commented Jun 23, 2023 • edited Loading

gadmm commented Jun 23, 2023

HazardousPeach commented Mar 4, 2024

ejgallego commented Mar 5, 2024

This comment was marked as outdated.

HazardousPeach commented May 6, 2024

ejgallego commented May 6, 2024

ejgallego commented May 6, 2024

HazardousPeach commented May 6, 2024 • edited Loading

ejgallego commented May 7, 2024

ejgallego commented May 7, 2024

HazardousPeach commented May 7, 2024 • edited Loading

ejgallego commented May 10, 2024 • edited Loading

ejgallego commented May 10, 2024

ejgallego commented May 10, 2024 • edited Loading

ejgallego commented May 10, 2024 • edited Loading

HazardousPeach commented May 13, 2024

ejgallego commented May 13, 2024

ejgallego commented May 13, 2024

ejgallego commented May 28, 2024

ejgallego commented Jun 21, 2023 •

edited

Loading

gadmm commented Jun 23, 2023 •

edited

Loading

HazardousPeach commented May 6, 2024 •

edited

Loading

HazardousPeach commented May 7, 2024 •

edited

Loading

ejgallego commented May 10, 2024 •

edited

Loading

ejgallego commented May 10, 2024 •

edited

Loading

ejgallego commented May 10, 2024 •

edited

Loading