-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic in a no-unwind function leads to not dropping local variables #123231
Comments
Nominating for t-lang discussion to get their vibe on this question. |
Wow that was years ago. What's the take-away from that discussion? |
As pointed out in the discussion, changing how we insert abort to functions is sufficient to change the observed behaviour of the implementation, but the key is to decide what's the allowed behaviour of any implementation. Options:
Some additional complexity involves a foreign exception. For example, if we have a mixture of stack frames with C++ and Rust then any specification may result in surprises. E.g. when a nounwind frame is introduced by a C++ noexcept function, it's up to C++ personality function to decide whether a Rust panic may trigger C++ cc @rust-lang/wg-ffi-unwind |
Judging from prior discussion, "no-unwind context" is a very specific technical term here and not something visible in Rust? Or what exactly do you mean?
Why? If by this you mean the current behavior, I find it undesirable and confusing and thus hindering debugging. Or do you mean that the panic will somehow predict whether it will during its unwinding hit a no-unwind stack frame and then change behavior early on? That's spooky-action-at-a-distance, so I also don't think that's desirable.
That's a pretty poor argument IMO, our job is to provide consistent and predictable semantics to our users whenever that is possible with reasonable performance.
For this issue I only care about Rust panics. |
Let me explain in greater detail, this'll be long.. For unwinding, there are a few type of cases for nounwind/noexcept/whatever:
Unwinders can either do single phase unwinding, or do two phase unwinding. For the latter, it do a unwind first without calling any destructors to find the frame that will catch the exception, and then starts an unwind with cleanups. If a catching frame cannot be found, the unwinding process fails at phase 1 (Let's ignore forced unwind for now). The Itanium C++ ABI unwinder has two phases. So if you have a C code (compiled without unwind tables) that calls into Rust A C++ noexcept function is of case (2) in GCC. With GCC's personality function implementation, the phase 1 will consider a noexcept function as a catching frame, and complete (stop at this frame) without failing. When unwind happens into the noexcept frame, no cleanup is performed and a termination happens immediately (similar to Rust's behaviour today). A C++ noexcept function is of case (3) in Clang because it doesn't yet support encoding the information to be exposed to personality function. noexcept is codegened as There's a subtle difference between (2) and (3) w.r.t to optimisation. Say we have this code: #include <cstdio>
struct D {
~D() {
fprintf(stderr, "Drop!\n");
}
};
static void foo() {
D d;
throw "";
}
static void bar() noexcept {
D d;
foo();
}
int main() {
bar();
} in both GCC and Clang, you will get one The Rust behaviour today is very similar to clang's behaviour in the example. Now to answer your questions
Yes, I mean this. Since unwind is of two phases, the phase 1 can determine if unwind is possible and can skip phase 2 entirely. It's already the case that Rust panic will cause nounwind at all if it escapes into functions with no unwind tables, or, with MSVC SEH, into cleanup code. It's helpful for debugging because all the stack frames are intact so you can inspect all frames upon abort. Currently when we unwind, hit an
Given FFI is very important regard to We certainly can define the behaviour to be "all destructors" being executed. It'll simply be requiring adding If leaving this unspecified allows better optimisation w.r.t. landing pad sizes, then IMO we should allow such optimisation given that a panic escaping to unwindable FFI interface is very rare and almost always a bug (given abort is imminent). If one wants all destructor to be called, they can very easily implement that behaviour with catch_unwind.
This will happen with Rust panic in presence with foreign frames, as well as foreign exception with Rust frames. I think it'll less consistent if our specification of Rust panic behaviour depends on whether a foreign frame is present or not. |
I'll reply in full later, for now I have a clarification question:
The Itanium C++ ABI unwinder has two phases.
Itanium is dead, why should we care?
2-phase unwinding sounds like a lot of unnecessary complexity to me.^^ But I guess they had their reasons.
|
The Itanium C++ ABI is the abi used by gcc and clang on most non-windows targets. Rust already uses Itanium EH for panics on the same set of targets. |
Basically every UNIX uses the same unwinder ABI as replacement for the old SjLj unwinder ABI which had non-zero overhead even when not throwing any exceptions. arm32 iOS is the only SjLj target we support(ed). |
Thanks for explaining the Itanium thing. While exploring what C++ does is interesting, I don't think C++ is necessarily a good guiding star to follow. We tend to value cross-platform consistency and predictability much more than C++ does. Having the number of drops depend on the optimization level sounds completely unacceptable to me. So, ignoring what C++ does -- what are the downsides to saying that consistently, everything must be dropped until the boundary, i.e. even in the last stackframe?
What exactly does this mean? You are assuming that I know what all these words mean. :) Can you state this in terms of what the Rust programmer sees as end-to-end behavior?
That argument applies to all panics. You are suggesting to make debugging better for some small subclass of panics. I don't think it's worth doing this only for "panics that happen to lead to an abort later". In fact I think that makes debugging worse because for some panics you'll see the full stack and for some you won't. Instead, just set a breakpoint on some symbol inside the panic machinery. (AFAIK we have a dedicated symbol for that?) Or set panic=abort. In both cases the debugger will reliably trap before unwinding begins.
I think consistency with C++ is just as often something we explicitly don't want as we disagree with the C++ design philosophy. I also doubt most C++ programmers will even know that this is how C++ behaves, so the consistency only helps those few people that know the ins and outs of how unwinding is implemented.
All panics are always a bug. The question is whether those landing pad size wins are worth it for the extra confusion that inconsistent behavior will cause. And as I said above I think making this opt-level-dependent is completely inacceptable. That would mean if I see a panic in my release build and then try to debug it in a debug build it will behave completely differently! Maybe it's okay to say that behavior can differ between targets and between Rust versions, but I don't think we want any more variability than that. |
@CAD97 also makes a good point:
If we allow the "unwind phase 1" to determine that unwinding will be skipped entirely, we end up with a In contrast, today we get a pretty nice error, where first there's a regular panic message and then when it hits a nounwind function (or drop, with the flag enabled), it prints a secondary message explaining why the panic was turned into an abort. (At least we get that on Linux. No idea if we reliably get it on all targets.) |
That won't work if you didn't have a debugger attached from the start, but are relying on a coredump produced at the point of the SIGILL. |
It's difficult for me to object to this, because I think the principle is good. But I'm not sure that I agree it applies to specifying the precise behavior of a program that is already in the process of early termination. A primary goal expressed to the working group from the start was to preserve unwind implementation flexibility, at least in terms of what is formally guaranteed. In fact, RFC-2945 originally did not even guarantee that a foreign exception entering Rust via It was also expressed that one downside of proposed changes to The person inside the Rust project who expressed these concerns is no longer active in the project. But I nevertheless think we should be very cautious about introducing strong guarantees around the "abort" behavior unless we are very confident that they can be upheld on every platform on which we might wish for Rust to run, without jeopardizing performance.
Unfortunately, I'm not sure this is actually true, especially in the context of cross-FFI unwinding. One simple counterexample is allocator exhaustion: yes, this can often indicate a memory leak, but it's also possible that someone is simply running too much on a particular device. |
As I have cheerily mentioned several times: I work on a library that catches longjmps, translates them into panics, and then translates them back into longjmps, using a baroque mechanism for making this actually conform to Rust's expected control flow semantics. A panic does not necessarily indicate a bug in the code that anyone using my library can actually control, because they do not necessarily have that much input into when the C code decides it wants to throw its home-baked "exceptions", and the alternatives to panicking when we run into these tend to be... worse. So I would turn it around to a different angle: Even assuming it is "always" a bug, what should anyone do about it? |
I said this in reply to a claim that "panics that will abort are a bug, therefore we can do weird semantics that make little sense unless one has studied unwinding ABIs for years". (I may have rephrased the argument a bit. ;) I don't think that's a valid argument, because sane behavior is important even in the presence of bugs. That's why UB is so nasty, it's the kind of bug where we don't have sane behavior any more. But here we're talking about cases which are explicitly not UB, they abort in a safe way, and I really don't think we should have UB-level of "spooky action at a distance" here -- something like the GCC behavior described above where with more optimizations, fewer destructors run. In terms of being able to debug and make sense of the situation, that's almost as bad as the nastiness one can see with UB. We have to ensure that will never happen.
The problem is, your definition of "being in the process of early termination" requires predicting the future. That goes entirely against the basic principles of an operational semantics, where we define step by step what happens. I would like for unwinding to be a step-by-step process that just proceeds stack frame by stack frame. That would be a sane semantics people can understand and Miri can implement. But having to predict whether we will abort due to a condition that only becomes apparent later is a complete mess. Basically, I am objecting to including anything like 2-phase-unwind into the opsem of Rust. 2-phase-unwind is an implementation detail, I don't see good reason why it should be in the spec. And without 2-phase-unwind, it must be the case that a panic that will abort 10 stack frames down, and a panic that will not abort, behave the same, since we can't predict the future. (And even worse than having 2-phase-unwind in the spec would be having it in the spec only sometimes. That's just a nightmare scenario. And it seems like only some target do 2-phase-unwind so we couldn't even make the spec say that we always do 2-phase unwind.) |
(I think this is not really related to this discussion, but I can't help but reply.^^) I agree the Also this is not like an effect system. An effect system would track which functions may or may not unwind, and just reject the code when you call a may-unwind function in an Using an optimized ABI is totally a possible role of an effect system, just like it is a role of a type system to enable optimized data representations. (Or did you mean to say "not really part of the proper role of an ABI"?) |
honestly I think, ironically, that C++ sometimes is a good model for unwinding... ...specifically, MSVC. my understanding is Windows adopts a different approach than the Itanium ABI unwinding, from the ground level up: the by-default behavior is to unify all mechanisms of nonlocal control flow for all languages compiled on it. that means it doesn't matter if you are a Rust panic, C++ exception, C longjmp, or even bare assembly: you get to participate in structured exception handling. throwing an exception, longjmp, and so on are all the same unwinding mechanism, by default, so everything works the same and everyone can catch and rethrow and in general understand each other's errors, even if C only sees all exceptions as the implementation of Visual C++ does have a compile option that allows C++ code to choose whether the try-catch interacts with the same exceptions C does by default, for reasons that are not clear to me. I believe the C++ code can still use for platforms that have this quirk it is probably very useful to preserve it. |
This means that, for it to work with Itanium EH (again, used by almost every non-windows platform) and exceptions thrown by C++, the abort edge in an |
This is necessary for some OS APIs. In Rust we have to resort to C shims to use such APIs, which isn't great. |
I am primarily concerned with unwinding from Rust panics, not exceptions triggered by other languages. I don't have as strong opinions on how C++ exceptions should behave as they pass through Rust stack frames. AFAIK we don't currently say much about what the rules even are there? |
SEH rules are indeed nice and consistent, but we aren't ever going to have a world where they're the only rules.
Frankly, I'd expect them to interact with destructors the same way for consistency, both as a user, and an implementor here. |
From #123231 (comment):
Don't we expose handlers via My mental model here aligns pretty closely with @RalfJung's I think: if the "extra cost" to making Drop behave the same way (e.g., if I'm debugging and stick an eprintln! in a Drop, I want to see it, even if the program aborts some amount of frames later!) regardless of whether there's some extern "C" function somewhere is just a bit of extra code in the binary, then I'd happily pay that price. Presumably, that code can be removed if LLVM (or Rust, via an effect system eventually) is able to statically prove a lack of Drop-requiring objects, too, and is necessary in every other function defined in Rust, right? |
It's EH Table size, and function size mostly IIRC. Compared to |
I believe LLVM optimizations are behavior-preserving, even in the presence of unwinding. |
I was about to say MIR inlining does not preserve behavior, but it seems like it does. I have no idea why, though -- after inlining, why do we execute the drop of |
It's because extern "C" functions today generate a terminator that immediately aborts |
Thanks @RalfJung, I couldn't explain it better.
I think this is the correct description of the current behaviour.
The MIR inlining happens after the aborting unwinding call MIR pass, which explains this behaviour. |
Right, after the "aborting unwinding call MIR pass", which for semantics purposes counts as a part of MIR generation, the code is
Unwinding in MIR behaves like a normal control-flow edge. You can have some platform-dependent behaviors when calling panic causes an abort instead of an unwind, but in that case you never start unwinding. |
I'd hope so.
FTR, C++ promises all-or-nothing for when destructors are run before a
std::terminate (though its per throw). If an exception would hit a
terminate boundary (noexcept function, top of stack, or destructor called
during stack unwinding), the compiler gets to choose whether or not stack
unwinding occurs, and then, if it does, it requires it to unwind the stack
straight up to the unwind boundary - it doesn't allow the compiler to only
unwind certain frames. It can, however, distinguish per-exception, so the
case of throw inside a noexcept function with no catch could cause an
immediate abort even when a throw one call deep would unwind then abort,
but in the one-call-deep case, it can't unwind the callee then see the
noexcept in the caller, skip its destructors, and terminate immediately.
…On Mon, Aug 26, 2024 at 09:39 Ariel Ben-Yehuda ***@***.***> wrote:
I am not sure if this accurately describes current behavior -- with
transformations like outlining, could it be the case that some destructors
in a stack frame run and others do not?
I *believe* LLVM optimizations are behavior-preserving, even in the
presence of unwinding.
—
Reply to this email directly, view it on GitHub
<#123231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2YGDX55BOEQDECXBXLZTMVZ3AVCNFSM6AAAAABFPSBJIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGI2DGMZQGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
(In lccc for |
I think "unwind edges are normal control-flow edges, panic might abort for implementation-defined reasons" is the right model of thinking about things. Which probably means we want to take the hack.md option (1) or (2) because the Rust 1.81 control flow edges are weird. |
So there are actually two reason for destructors being skipped?
I that case I think we should definitely change MIR generation to remove item 2 from this list.
Doesn't that contradict the observations made here?
Which hack.md? |
Yes |
Ah, so @nbdd0121 is preparing a summary as well, great. :)
I would say behavior is quite different today between that and In contrast, with |
Theoretically the destructor that we called is not the boundary that does not permit unwind, but the cleanup blocks. Similarly, the |
https://eel.is/c++draft/except.terminate#2:
|
Yeah, I just found that. I stand corrected in relation to non-throwing exception specifications. Though, this sentance sticks out to me
|
I don't agree with that framing. From an end user perspective, there's no such thing as a "cleanup block" -- that's an implementation detail. What matters is: when a destructor gets called during unwinding, it may not itself unwind. An unwind out of a destructor called during unwinding leads to an immediate abort. Those semantics make a lot of sense intuitively at a high level, without going into MIR details --and they match the currently implemented semantics. It is only the |
At the end of the drop glue, not destructor. e.g. this (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=dfa1c8e8df671512b6cc9bf1aae964e9): struct Noisy1(Noisy2);
struct Noisy2(u32);
impl Drop for Noisy1 {
fn drop(&mut self) {
eprintln!("Noisy1(Noisy2({}))", self.0.0);
panic!("unwind 2")
}
}
impl Drop for Noisy2 {
fn drop(&mut self) {
eprintln!("Noisy2({})", self.0)
}
}
fn main() {
let _m = Noisy2(0);
let _n = Noisy1(Noisy2(1));
panic!("unwind")
} Prints:
Only the |
When I say "destructor" I think I mean what you call "drop glue": I am referring to |
yes standard confusion between "destructor=Drop::drop" and "destructor=drop_in_place". Does the reference have a standard for the naming there (I think it does, calling I find everything other than "drop impls are always nounwind" or "double-panics insta-abort" weird in some cases, and "drop impls are always nounwind" and "double-panics insta-abort" have their own disadvantages (tho I'm still not convinced they are not the right solution). But this is a digression. |
I think there are two questions here:
Do we have any examples of real code that was depending on the current behavior? As I understand it, the point of this stabilization is to take code that was always technically UB but had no way to be correctly written, so I think we should try to be accommodating here. |
I think that "any given instance of unwinding might abort for implementation-defined reasons" is descriptive, but that as long as unwinding does not abort, unwinding control flow should be very well-defined. And if it's well-defined, I think the version after #129582 is a more obvious control flow than the version in 1.81. |
I think we should guarantee that we run all destructors during unwind, and leave room for "unwind might fail to initiate and abort immediately instead" to account for 2-phase unwinding. That's reasonably easy to understand. This is currently not the case but #129582 implements that, IIUC. I'm not aware of any code that would rely on the abort happening "early", i.e. skipping some destructors. On current stable, the destructor in the OP example actually does run, so the proposed guarantee (implemented by #129582) is also closer to the status quo than what happens in current beta.
It should implement the guarantee. :) |
@nbdd0121 what are the downsides of #129582?
The last point doesn't seem to apply for this PR as it is entirely target-independent. C++ AFAIK allows these destructors to be run, but gcc and clang decide against it. I would guess most C++ programmers are not aware of this, and it seems unlikely anyone would rely on this. Code size of course is increased if we generate more unwinding code, but people that build their code with |
As long as "unwind might fail to initiate and abort immediately instead" is allowed:
|
This would be compliant with the specification, but I don't think the code size improvements in that case are worth it unless proven otherwise. |
@rustbot labels -I-lang-nominated We now have nominated: So we can handle this in that nomination. |
Make destructors on `extern "C"` frames to be executed This would make the example in rust-lang#123231 print "Noisy Drop". I didn't mark this as fixing the issue because the behaviour is yet to be spec'ed. Tracking: - rust-lang#74990
Make destructors on `extern "C"` frames to be executed This would make the example in rust-lang#123231 print "Noisy Drop". I didn't mark this as fixing the issue because the behaviour is yet to be spec'ed. Tracking: - rust-lang#74990
Raised by @CAD97:
Reproducing example:
I would expect "Noisy Drop" to be printed, but it is not.
IMO it'd make most sense to guarantee that with panic=unwind, this destructor is still called. @nbdd0121 however said they don't want to guarantee this.
What is the motivation for leaving this unspecified? The current behavior is quite surprising. If I understand @CAD97 correctly, we currently could make "Noisy Drop" be executed by tweaking the MIR we generate.
Tracking:
The text was updated successfully, but these errors were encountered: