Add an unsafe method that allows the user to recover the allocator from the memory. #34

Determinant · 2024-07-26T22:55:58Z

Hi team, we're a startup working on a platform which relies on a software-defined persistent (non-volatile) memory. So that means a program (in our case a piece of WASM code) always has its memory persisted. Talc currently is a perfect fit because it is lightweight and can help us manage the allocation for these "magical" variables. But in order to get this to work, we added a method called recover_from_existing_heap that reconstructs the state of Talc based on the same span of the memory it previously worked on.

We feel this could be an interesting and general feature to talc, especially for the non-volatile memory use case. Would like to see if you're interested in merging the code.

…rsisted memory

SFBdragon · 2024-07-27T10:35:11Z

Hello, thanks for reaching out,

I'm happy to support this but I'd like to do so a little differently if possible. My main concerns are

a. API usability when combined with multiple arenas. This function will cause a lot of strange behavior and then crash if someone uses the wrong Span returned by talc, as the in-heap metadata is only present in the first successfully-established arena. talc doesn't need users to persist every arena, or any arena. It just needs its old in-heap metadata pointer.
b. API usability when combined with custom OOM handlers. On WASM talc technically doesn't need to persist the state of the WasmHandler, not doing do may just increase arena fragmentation slightly, once. However it's not quite as simple for ones like ClaimOnOom or a user-written custom OomHandler.
c. I'm nervous about getting into ABI-compatiblity territory. You only need this to recover on the same build, right? (Rust's lack of ABI guarantees would make life difficult, otherwise, I presume.)
d. I'd like to preserve counters if possible, as having a fairly disjoint feature space sounds unappealing. But this isn't a blocking issue. (Placing the counters into the heap is an option as well, of course.)

Getting around all this seems doable to me, depending on your constraints.

To get a feel for what solutions are viable to you:

You seem to have access to a previous Span during recovery. Can you persist other data too? (I'm suppose saving the bytes of the talc struct isn't an option?)
To clarify, are you saving+restoring state or are you recovering from an unknown state? If the latter, could failure have occurred during allocation?
Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible:
- Just after getting the first Span?
- Just before/after persisting the WASM program's memory?

Overall, I like the idea. In my mental model of what I'd personally like to use talc for, this has a clear use case. However I'd like to try get this to a point where it's very clear how to use this API correctly even if you're using other features of talc.

Let me know what would and wouldn't work for you.

Aside, I'm working on an overhaul to talc, for a bunch of reasons. Depending on how this feature is implemented, this might force that overhaul to become a separate crate. I'm not against that, but being able to carry over whatever recovery API/mechanism we decide on may be ideal regardless. Just something on my mind; it shouldn't affect what we decide here.

SFBdragon · 2024-07-28T22:21:58Z

@Determinant I see you've dropped a 👍 on my comment. I'm blocked on a lack of knowledge about what your intentions/constraints are, just in case it wasn't clear. If you still intend to get back to me though, no rush.

Determinant · 2024-07-29T00:20:40Z

Sorry I'm afk over the weekend and because you asked a bunch of great questions, I'd like to address them properly, so I thumbed up and very much appreciated the responsiveness!

Determinant · 2024-07-29T22:19:38Z

a/b. This is also my fear and not surprised if this PR needs more work. I feel currently I use talc in the simplest way (single arena, as in my case that suffices).
c. Yes, because we'll ship the self-contained WASM in the form of bytecode, which has its ABI already established, so when it recovers the state, it does so for the same code.
d. Right, I feel that there should be a way to recover counters as well. Previously, to quickly "hack" a version that we can use in our current in-development prototype, I just decided to ignore this case as I knew little about talc's internal mechanism of counters (and when I began tweaking the code, the counter wasn't introduced).

You seem to have access to a previous Span during recovery. Can you persist other data too? (I'm suppose saving the bytes of the talc struct isn't an option?) -- if the talc struct has a constant size and contains no further indirection/heap allocation on the host (and stick to that promise in the future), this is in fact a great option: I can reserve a fixed area in my persistent memory to accommodate the struct, but would otherwise be hard and a "chicken-and-egg" issue if the struct itself has a dynamic layout
To clarify, are you saving+restoring state or are you recovering from an unknown state? If the latter, could failure have occurred during allocation? -- we're just saving+restoring the state
Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible -- it is possible as we can try to schedule a call before releasing the WASM instance (and its associative memory). To recover, we currently also has a initial setup function call made into WASM instance before running anything else.

Thanks for being open on this discussion. talc is a great library to use from my perspective and it looks like the devs behind it are even better. :) I don't have any strong opinion as for how this could be eventually implemented as this is just a humble use case that we need to make it usable in our system. I'm happy to try persisting/laying out the Talc struct directly to see if that works. Previously my fear is about the guarantee (static size, etc.) but if that part is in good care, I actually prefer this solution.

SFBdragon · 2024-08-01T20:42:27Z

if the talc struct has a constant size and contains no further indirection/heap allocation on the host (and stick to that promise in the future), this is in fact a great option: I can reserve a fixed area in my persistent memory to accommodate the struct, but would otherwise be hard and a "chicken-and-egg" issue if the struct itself has a dynamic layout

Cool. Talc has a few practical limitations that mean saving the bytes should be fine if you know what OomHandler is being used:

Talc and wrapping types must be usable in static globals. This requires that they are Sized. OomHandler structs are stored in the struct too so they're also required to be sized.
All unsized/indirect talc data is stored in its own heap(s), as that's the only place talc knows it's "safe" to store extra data, besides the struct itself.
The only alternative cross-platform viable data storage is static data which is a bad idea if someone created two instances of one allocator (which will be supported unless a platform integration forbids it but WASM does not).

So unless the OomHandler holds OS resource handles or something like that, it should be fine. The WasmHandler (OomHandler for WASM) won't do anything like this, as far as I can foresee. One option to get around the possibility of that changing would be to implement your own WasmHandler (i.e. just copy Talc's current one to start with).

Here's my thinking when it comes to OomHandlers:

They're very flexible, you can store pretty much anything in them and do whatever inside of them (on global state and the OomHandler's own persisted state), and they should stay that way. I'd like to avoid artificially constraining their flexibility as well.
If people make OomHandlers with weird requirements or anything like that, it's fine, but its their responsibility to handle those requirements. If someone wants weird requirements, instead of forcing those requirements on all OomHandlers, they should maintain their own.

And I don't think forcing all OomHandler implementation to be session-independent is a great idea.

Is calling something like fn export_internal_data(&self) -> returndata { ... } and saving the returndata (a struct with public variables or a byte array perhaps) possible

(I'm much more against this idea now, because it means Talc needs to force all OomHandlers to permit this.)

However, I also feel like making people maintain their own versions of pre-made OomHandlers just because I (or another contributor) might break something is not ideal...

And we'd need to guarantee Talc's (OomHandler-dependent) session-independence anyway. (I'm happy to commit to that indefinitely, I think. Talc itself is supposed to be very independent. No std, no OS, no platform dependency, etc. I see no reason why Talc should couple itself with session-lifetime state.)

Ensuring WasmHandler is permanently session-independent is another matter though.

I'm nervous about

accidentally breaking things
how to roll out an intentionally breaking change in this regard

One thing I'm thinking about it making a type alias type SessionIndependentWasmHandler = WasmHandler; and documenting/testing that type instead. (The name could use some work.) If I break the property of WasmHandler I can just create a drop-in implementation for SessionIndependentWasmHandler.

I've still got some fuzzy lingering concerns... I want to think about this more, but not tonight.

I've got a few alternative ideas but none of them seem significantly better...

Ah well. I'll keep thinking about it and get back to you.

SFBdragon · 2024-08-06T06:15:09Z

I'm a bit swamped at the moment. Feel free to copy the talc bytes and transmute it back for now. It will (should) work.

I'm currently planning to document the properties that allow for this and test it. There might be a few API changes, but nothing that will be procedurally distinct from getting some data T. That must be used to re-instate it. (Same as transmuting bytes.)

I'll leave this issue open and get back to you if there are any changes to make once I commit to something, but in the meantime I'll look after the correctness of transmuting the bytes. Everything will be in the Talc struct or the Talc heap(s).

I'd like to avoid committing to anything until V5 is done, though I plan to keep this feature a possibility on WebAssembly without any major procedural differences to save the state of Talc. (If all goes to plan, I'll make it easier to make this guarantee.)

Determinant · 2024-08-08T19:36:34Z

I went through the detailed thoughts and have a better idea of what's tricky in this case. Fortunately, for our current use case, OOM handler is not an issue because we tend to use a very simple one that's pretty independent (currently, just error because we initially already give abundant memory space to Talc so if the user goes beyond that, it should be an error in our case anyway).

As one step further, I'm putting Talc to a special reserved portion of the persistent memory, so in this case, it requires even less promise as for its byte encoding.

Appreciate the help and effort!

add recover_from_existing_heap method to allow using talc over a pe…

984833f

…rsisted memory

polarathene mentioned this pull request Oct 7, 2024

docs: Revise notes for example std_global_allocator.rs #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an unsafe method that allows the user to recover the allocator from the memory. #34

Add an unsafe method that allows the user to recover the allocator from the memory. #34

Determinant commented Jul 26, 2024 •

edited

Loading

SFBdragon commented Jul 27, 2024 •

edited

Loading

SFBdragon commented Jul 28, 2024

Determinant commented Jul 29, 2024

Determinant commented Jul 29, 2024 •

edited

Loading

SFBdragon commented Aug 1, 2024 •

edited

Loading

SFBdragon commented Aug 6, 2024

Determinant commented Aug 8, 2024 •

edited

Loading

Add an unsafe method that allows the user to recover the allocator from the memory. #34

Are you sure you want to change the base?

Add an unsafe method that allows the user to recover the allocator from the memory. #34

Conversation

Determinant commented Jul 26, 2024 • edited Loading

SFBdragon commented Jul 27, 2024 • edited Loading

SFBdragon commented Jul 28, 2024

Determinant commented Jul 29, 2024

Determinant commented Jul 29, 2024 • edited Loading

SFBdragon commented Aug 1, 2024 • edited Loading

SFBdragon commented Aug 6, 2024

Determinant commented Aug 8, 2024 • edited Loading

Determinant commented Jul 26, 2024 •

edited

Loading

SFBdragon commented Jul 27, 2024 •

edited

Loading

Determinant commented Jul 29, 2024 •

edited

Loading

SFBdragon commented Aug 1, 2024 •

edited

Loading

Determinant commented Aug 8, 2024 •

edited

Loading