Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SlotMap.compute can lead to deadlock in ThreadSafeSlotMapContainer #1746

Conversation

nabacg
Copy link
Contributor

@nabacg nabacg commented Dec 2, 2024

New SlotMap compute method introduced in b7ece52 can lead to a Deadlock during re-definition of existing object properties when used in combination with ThreadSafeSlotMapContainer.

The problem occurs, because ThreadSafeSlotMapContainer obtains and holds a write lock for the entire duration of method execution, including SlotComputer lambda logic and in certain conditions (details below) ScriptableObject.defineOwnProperty call to getPropertyDescriptor will attempt to obtain a read lock on the same slotMap instance from inside SlotComputer lambda body, as marked on below code snippet :

public abstract class ScriptableObject
... 
    protected void defineOwnProperty(
            Context cx, Object id, ScriptableObject desc, boolean checkValid) {

        Object key = null;
        int index = 0;
...
slotMap.compute(                            <<< WRITE LOCK AQUIRED 
        key,
        index,
        (k, ix, existing) -> {
            if (checkValid) {
                ScriptableObject current =
                        existing == null ? null : 
                               existing.getPropertyDescriptor(cx, this);  <<< READ LOCK  NEEDED
                checkPropertyChange(id, current, desc);
            }
... 
 });                                         <<< WRITE LOCK RELEASED  
...

Conditions necessary for this problem to occur:

  • use of ThreadSafeSlotMapContainer, controlled by Context.FEATURE_THREAD_SAFE_OBJECTS which is not ON by default
  • property re-definition, since we only call existing.getPropertyDescriptor if existing != null.
  • finally, the object in question (target of property definition) doesn't have the correct parent scope defined. This is likely to occur if it's created in Java, like in provided tests. Deadlock occurs when:
    • existing.getPropertyDescriptor(cx, this) is called during property redefinition, it will be used as the "global scope" when Property Descriptor object is initialised.
    • This means that same this object is passed to ScriptRuntime.getBuiltinPrototype as scope
    • and eventually an attempt is made to read a property on this object
      - which requires read lock on the same instance of ThreadSafeSlotMapContainer that slotMap.compute is already locking.

This PR adds 2 test cases reproducing the same issue, both with Context.FEATURE_THREAD_SAFE_OBJECTS switched ON to force use of ThreadSafeSlotMapContainer:

  • new DeadlockReproTest, the most minimal repro we could come up with
  • new test case in PropertyTest, with identical code to existing redefineGetterProperty test, showing that just changing SlotMapContainer causes the deadlock to occur.

@nabacg nabacg closed this Dec 2, 2024
@nabacg
Copy link
Contributor Author

nabacg commented Dec 2, 2024

Closing to prevent CI deadlocks

@nabacg
Copy link
Contributor Author

nabacg commented Dec 2, 2024

@gbrail @rbri could you kill this run please? it's not going to finish due to deadlock ^. I sadly forgot that even draft PR will kick off a CI run.

@p-bakker
Copy link
Collaborator

p-bakker commented Dec 2, 2024

Cancelled

@p-bakker p-bakker reopened this Dec 2, 2024
@p-bakker p-bakker closed this Dec 2, 2024
@p-bakker p-bakker reopened this Dec 2, 2024
@rbri
Copy link
Collaborator

rbri commented Dec 2, 2024

@p-bakker sorry i guess it is too late now

@gbrail
Copy link
Collaborator

gbrail commented Dec 3, 2024

This is a great find -- thanks, and I'm going to spend a bit of time understanding it since I introduced a deadlock!

This is the first time I've seen the new "var" keyword in a PR. (I know, it's not really a "keyword.") It was introduced in Java 10 so it's fair game for us to use AFAIK!

@rbri
Copy link
Collaborator

rbri commented Dec 3, 2024

This is the first time I've seen the new "var" keyword in a PR.

Not correct, because core-js is still on java 8 i had to 'backport' already some var's after the merges. For me it will be fine to live without var - i think it makes our complicated code a bit more readable at most places.

boolean isSameGetterFunction(Object function) {
if (function == Scriptable.NOT_FOUND) {
return true;
} else if (getter == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a matter of taste - but there is no else needed because of the return

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

return slot;
}

private ScriptableObject getBuiltInDescriptor(String name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe getBuiltInDataDescriptor is a better name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and changed.

@aardvark179
Copy link
Contributor

This is a great find -- thanks, and I'm going to spend a bit of time understanding it since I introduced a deadlock!

This is the first time I've seen the new "var" keyword in a PR. (I know, it's not really a "keyword.") It was introduced in Java 10 so it's fair game for us to use AFAIK!

There were already three vars in ScriptableObject#ensureLambdaAccessorSlot in lines I touched with this change, so I figured it was fine to use it, and since source version is set to 11.

I actually find var much easier when reading complex code as the type, or meaning, is usually clear from the variable, and it often reduces noise when refactoring, but I understand tastes differ, and I'll change it if this is a big issue.

@nabacg
Copy link
Contributor Author

nabacg commented Dec 4, 2024

There were already three vars in ScriptableObject#ensureLambdaAccessorSlot in lines I touched with this change,

That's most likely me, introduced as part of #1577 , so it's fairly recent. I personally also find var more readable, but happy to refactor those methods especially if it's causing problems during "backports"..

@nabacg
Copy link
Contributor Author

nabacg commented Dec 4, 2024

Doing some more testing on this, if I set Context.FEATURE_THREAD_SAFE_OBJECTS=true and make ThreadSafeSlotMapContainer by used by default, I still get deadlocks in 262 tests, even on this branch.

Following suites are the problem:

  • Array/prototype/forEach/15.4.4.18-7-b-6.js
  • Array/prototype/forEach/15.4.4.18-7-c-i-19.js
  • Array/prototype/forEach/15.4.4.18-2-12.js

They all have a variation of below Object.defineProperty call

  Object.defineProperty(Object.prototype, "1", {  
    get: function() {  
      return 6.99;  
    },  
    configurable: true  
  });  

which passes

  • Object.prototype as 1 argument, target of property definition
  • descriptor as 3 argument, a literal object, so it's prototype is Object.prototype ( the 1 argument)
  • ScriptableObject.defineOwnProperty -> calls applyDescriptorToAttributeBitset with desc (the 3rd argument above) from inside .compute lambda ( holds the write lock)
    • applyDescriptorToAttributeBitset internally calls desc.getProperty
      - desc will perform property lookup on itself, then it's prototype (1 argument above)
      - prototype's slotMap .query requires a read lock on the same slotMap instance

We're working on fixing this too, I mostly wanted to share the new details.

@nabacg
Copy link
Contributor Author

nabacg commented Dec 4, 2024

Having said that, I do worry that we don't catch those ThreadSafeSlotMapContainer related issues earlier in the process, because it's fairly easy to introduce problems like this even with fairly innocent looking refactoring.

Would running a CI build with Context.FEATURE_THREAD_SAFE_OBJECTS switched ON in parallel to current runs something worth considering ?

…fineProperty to avoid deadlocks, that happen when current instance of ScriptableObject is used as scope inside slotMap.compute lambda.
@nabacg nabacg marked this pull request as ready for review December 5, 2024 17:28
@nabacg
Copy link
Contributor Author

nabacg commented Dec 5, 2024

with d350e81 (which funnily was mostly refactoring my own code from #1577 ), I can finally pass all tests with Context.FEATURE_THREAD_SAFE_OBJECTS switched ON by default, without deadlocking anything.

@gbrail
Copy link
Collaborator

gbrail commented Dec 6, 2024

OK -- let me look at this again, and resolve three things:

First -- I agree -- "var" is fine! We moved to Java 11 minimum in this release so there should be no problems (and TBH I don't know how "var" shows up in the bytecode so it might matter even less).

Second -- we should update the docs for SlotMap to indicate that the lambda in "compute" must not lock the slot map.

Third -- there is another way to resolve this, because I assume that the "deadlock" you mention is a problem with the non-recursive nature of the lock. We could replace all that "stamped lock" stuff in ThreadSafeSlotMapContainer with simple "synchronized" blocks, which are recursive and which optimize out pretty well when there's no contention anyway.

But if you're good with this change, I'll look at it and it's also fine.

Copy link
Collaborator

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! There are changes in some fundamental places here so I looked at the code coverage of the changed code (you can get this with the "testCodeCoverageReport" gradle task, instructions in the README) and I see a few places where stuff that's changed in this PR isn't covered.

Can you see if you can add some test cases to your new tests (or old ones if you'd like) so that we hit those spots in testing?

If it's onerous I'm happy to hear about it, but I'm hoping that it's not, and a bit more coverage would help in this case. Thanks!

// it's dangerous to use `this` as scope inside slotMap.compute.
// It can cause deadlock when ThreadSafeSlotMapContainer is used

return replaceExistingLambdaSlot(cx, name, existing, newSlot);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at code coverage, since you're changing some fundamental stuff here, and I see that line 1758 doesn't get covered. Do you think that you could devise a test case for that and add it to your new test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, 474c3ba should take care of it.


if (ScriptRuntime.isSymbol(id)) {
if (id instanceof SymbolKey) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking at coverage, nothing from line 957 to 961 seems to get covered either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little more convoluted: as of right now, I think it's impossible for this code path to be executed, thus it will be challenging to test it. This is because it's only called from IdScriptableObject::defineOwnProperty ( here and here) and before the call, we've already established that key must be a CharSequence, so it can't a Symbol. So maybe the solutions is to remove this branch?

Having said that, this doesn't feel right and perhaps it's an issue with current implementation of IdScriptableObject::defineOwnProperty, since it's only handling String (or CharSequence) keys and essentially delegating Symbol key'ed properties to it's base class (ScriptableObject). It's very consistent with how getOwnPropertyDescriptor implementation, which does try to find built in descriptor using both Strings and Symbols (here ). I believe this currently works mostly, because we always check base class for property descriptor first, but it's a little confusing why we need that symbol lookup.

One could also imagine, someday in the future, someone creating a new built-in NativeX class extending IdScriptableObject that defines a Symbol keyed property, that's shouldn't be redefined ( i.e. configurable: false ) and current implementation of IdScriptableObject::defineOwnProperty would not check it ( only Strings are handled) and allow base class implementation to override it.

I appreciate it's a little theoretical, but wanted to share this and ask for opinions on whether we should address this? Also if addressing this deserves a separate PR or should I add it here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks -- I suppose one way we could deal with this could be to assert, and in the future if someone wants to implement a subclass that needs it, we'd fix it -- although in many cases we are trying to get rid of IdScriptableObject as it doesn't yield the performance benefit it once did. If you're not comfortable with that, however, I'd also be OK leaving this as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally wouldn't mind leaving it as is for now, I don't see an obvious place to assert that wouldn't break existing functionality or be completely redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then I'm satisfied, thanks!

@gbrail gbrail merged commit 80933c5 into mozilla:master Dec 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants