Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an unsafe method that allows the user to recover the allocator from the memory. #34

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Determinant
Copy link

@Determinant Determinant commented Jul 26, 2024

Hi team, we're a startup working on a platform which relies on a software-defined persistent (non-volatile) memory. So that means a program (in our case a piece of WASM code) always has its memory persisted. Talc currently is a perfect fit because it is lightweight and can help us manage the allocation for these "magical" variables. But in order to get this to work, we added a method called recover_from_existing_heap that reconstructs the state of Talc based on the same span of the memory it previously worked on.

We feel this could be an interesting and general feature to talc, especially for the non-volatile memory use case. Would like to see if you're interested in merging the code.

@SFBdragon
Copy link
Owner

SFBdragon commented Jul 27, 2024

Hello, thanks for reaching out,

I'm happy to support this but I'd like to do so a little differently if possible. My main concerns are

a. API usability when combined with multiple arenas. This function will cause a lot of strange behavior and then crash if someone uses the wrong Span returned by talc, as the in-heap metadata is only present in the first successfully-established arena. talc doesn't need users to persist every arena, or any arena. It just needs its old in-heap metadata pointer.
b. API usability when combined with custom OOM handlers. On WASM talc technically doesn't need to persist the state of the WasmHandler, not doing do may just increase arena fragmentation slightly, once. However it's not quite as simple for ones like ClaimOnOom or a user-written custom OomHandler.
c. I'm nervous about getting into ABI-compatiblity territory. You only need this to recover on the same build, right? (Rust's lack of ABI guarantees would make life difficult, otherwise, I presume.)
d. I'd like to preserve counters if possible, as having a fairly disjoint feature space sounds unappealing. But this isn't a blocking issue. (Placing the counters into the heap is an option as well, of course.)

Getting around all this seems doable to me, depending on your constraints.

To get a feel for what solutions are viable to you:

  1. You seem to have access to a previous Span during recovery. Can you persist other data too? (I'm suppose saving the bytes of the talc struct isn't an option?)
  2. To clarify, are you saving+restoring state or are you recovering from an unknown state? If the latter, could failure have occurred during allocation?
  3. Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible:
    • Just after getting the first Span?
    • Just before/after persisting the WASM program's memory?

Overall, I like the idea. In my mental model of what I'd personally like to use talc for, this has a clear use case. However I'd like to try get this to a point where it's very clear how to use this API correctly even if you're using other features of talc.

Let me know what would and wouldn't work for you.


Aside, I'm working on an overhaul to talc, for a bunch of reasons. Depending on how this feature is implemented, this might force that overhaul to become a separate crate. I'm not against that, but being able to carry over whatever recovery API/mechanism we decide on may be ideal regardless. Just something on my mind; it shouldn't affect what we decide here.

@SFBdragon
Copy link
Owner

@Determinant I see you've dropped a 👍 on my comment. I'm blocked on a lack of knowledge about what your intentions/constraints are, just in case it wasn't clear. If you still intend to get back to me though, no rush.

@Determinant
Copy link
Author

Sorry I'm afk over the weekend and because you asked a bunch of great questions, I'd like to address them properly, so I thumbed up and very much appreciated the responsiveness!

@Determinant
Copy link
Author

Determinant commented Jul 29, 2024

a/b. This is also my fear and not surprised if this PR needs more work. I feel currently I use talc in the simplest way (single arena, as in my case that suffices).
c. Yes, because we'll ship the self-contained WASM in the form of bytecode, which has its ABI already established, so when it recovers the state, it does so for the same code.
d. Right, I feel that there should be a way to recover counters as well. Previously, to quickly "hack" a version that we can use in our current in-development prototype, I just decided to ignore this case as I knew little about talc's internal mechanism of counters (and when I began tweaking the code, the counter wasn't introduced).

  • You seem to have access to a previous Span during recovery. Can you persist other data too? (I'm suppose saving the bytes of the talc struct isn't an option?) -- if the talc struct has a constant size and contains no further indirection/heap allocation on the host (and stick to that promise in the future), this is in fact a great option: I can reserve a fixed area in my persistent memory to accommodate the struct, but would otherwise be hard and a "chicken-and-egg" issue if the struct itself has a dynamic layout

  • To clarify, are you saving+restoring state or are you recovering from an unknown state? If the latter, could failure have occurred during allocation? -- we're just saving+restoring the state

  • Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible -- it is possible as we can try to schedule a call before releasing the WASM instance (and its associative memory). To recover, we currently also has a initial setup function call made into WASM instance before running anything else.

Thanks for being open on this discussion. talc is a great library to use from my perspective and it looks like the devs behind it are even better. :) I don't have any strong opinion as for how this could be eventually implemented as this is just a humble use case that we need to make it usable in our system. I'm happy to try persisting/laying out the Talc struct directly to see if that works. Previously my fear is about the guarantee (static size, etc.) but if that part is in good care, I actually prefer this solution.

@SFBdragon
Copy link
Owner

SFBdragon commented Aug 1, 2024

if the talc struct has a constant size and contains no further indirection/heap allocation on the host (and stick to that promise in the future), this is in fact a great option: I can reserve a fixed area in my persistent memory to accommodate the struct, but would otherwise be hard and a "chicken-and-egg" issue if the struct itself has a dynamic layout

Cool. Talc has a few practical limitations that mean saving the bytes should be fine if you know what OomHandler is being used:

  1. Talc and wrapping types must be usable in static globals. This requires that they are Sized. OomHandler structs are stored in the struct too so they're also required to be sized.
  2. All unsized/indirect talc data is stored in its own heap(s), as that's the only place talc knows it's "safe" to store extra data, besides the struct itself.
  3. The only alternative cross-platform viable data storage is static data which is a bad idea if someone created two instances of one allocator (which will be supported unless a platform integration forbids it but WASM does not).

So unless the OomHandler holds OS resource handles or something like that, it should be fine. The WasmHandler (OomHandler for WASM) won't do anything like this, as far as I can foresee. One option to get around the possibility of that changing would be to implement your own WasmHandler (i.e. just copy Talc's current one to start with).

Here's my thinking when it comes to OomHandlers:

  • They're very flexible, you can store pretty much anything in them and do whatever inside of them (on global state and the OomHandler's own persisted state), and they should stay that way. I'd like to avoid artificially constraining their flexibility as well.
  • If people make OomHandlers with weird requirements or anything like that, it's fine, but its their responsibility to handle those requirements. If someone wants weird requirements, instead of forcing those requirements on all OomHandlers, they should maintain their own.

And I don't think forcing all OomHandler implementation to be session-independent is a great idea.

Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible

(I'm much more against this idea now, because it means Talc needs to force all OomHandlers to permit this.)

However, I also feel like making people maintain their own versions of pre-made OomHandlers just because I (or another contributor) might break something is not ideal...

And we'd need to guarantee Talc's (OomHandler-dependent) session-independence anyway. (I'm happy to commit to that indefinitely, I think. Talc itself is supposed to be very independent. No std, no OS, no platform dependency, etc. I see no reason why Talc should couple itself with session-lifetime state.)


Ensuring WasmHandler is permanently session-independent is another matter though.

I'm nervous about

  • accidentally breaking things
  • how to roll out an intentionally breaking change in this regard

One thing I'm thinking about it making a type alias type SessionIndependentWasmHandler = WasmHandler; and documenting/testing that type instead. (The name could use some work.) If I break the property of WasmHandler I can just create a drop-in implementation for SessionIndependentWasmHandler.


I've still got some fuzzy lingering concerns... I want to think about this more, but not tonight.

I've got a few alternative ideas but none of them seem significantly better...

Ah well. I'll keep thinking about it and get back to you.

@SFBdragon
Copy link
Owner

I'm a bit swamped at the moment. Feel free to copy the talc bytes and transmute it back for now. It will (should) work.

I'm currently planning to document the properties that allow for this and test it. There might be a few API changes, but nothing that will be procedurally distinct from getting some data T. That must be used to re-instate it. (Same as transmuting bytes.)

I'll leave this issue open and get back to you if there are any changes to make once I commit to something, but in the meantime I'll look after the correctness of transmuting the bytes. Everything will be in the Talc struct or the Talc heap(s).


I'd like to avoid committing to anything until V5 is done, though I plan to keep this feature a possibility on WebAssembly without any major procedural differences to save the state of Talc. (If all goes to plan, I'll make it easier to make this guarantee.)

@Determinant
Copy link
Author

Determinant commented Aug 8, 2024

I went through the detailed thoughts and have a better idea of what's tricky in this case. Fortunately, for our current use case, OOM handler is not an issue because we tend to use a very simple one that's pretty independent (currently, just error because we initially already give abundant memory space to Talc so if the user goes beyond that, it should be an error in our case anyway).

As one step further, I'm putting Talc to a special reserved portion of the persistent memory, so in this case, it requires even less promise as for its byte encoding.

Appreciate the help and effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants