Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The idea of CertPolicyBase::CertServer is vulnerable to race conditions #2

Open
phonexicum opened this issue Jul 4, 2024 · 10 comments · May be fixed by #7 or #6
Open

The idea of CertPolicyBase::CertServer is vulnerable to race conditions #2

phonexicum opened this issue Jul 4, 2024 · 10 comments · May be fixed by #7 or #6

Comments

@phonexicum
Copy link
Contributor

phonexicum commented Jul 4, 2024

Hi, first of all, thank you for this project, it is awesome.

problem description

During development of policy module for the ADCS, I run into a problem of race conditions.
It turns out the ADCS (certsrv.exe) service under high load may execute ICertPolicy::VerifyRequest concurrently in different threads.
And if one will use in VerifyRequest some shared resources (for example CertPolicyBase.CertServer), it may result in unpredictable behavior.

Examples of errors could be:

Exception: System.Runtime.InteropServices.COMException (0x8000FFFF): CCertServerPolicy::EnumerateExtensions: Catastrophic failure Unexpected method call sequence. 0x8000ffff (-2147418113 E_UNEXPECTED)
   at CERTCLILib.CCertServerPolicyClass.EnumerateExtensions()
   at ADCS.CertMod.Managed.CertServerModule.GetRequestExtensions() in D:\a\1\s\ADCS.CertMod.Managed\CertServerModule.cs:line 142
...
Log Name:      Application
Source:        Microsoft-Windows-CertificationAuthority
Date:          19.06.2024 18:06:08
Event ID:      90
Task Category: None
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      redacted
Description:
506.1084.90: Active Directory Certificate Services detected an exception at address 0x00007FFA4FE449B9.  Flags = 0x00000000.  The exception is The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s. 0xc0000005 (NT: 0xc0000005 STATUS_ACCESS_VIOLATION).
Log Name:      Application
Source:        ESENT
Date:          19.06.2024 18:02:03
Event ID:      902
Task Category: Transaction Manager
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      redacted
Description:
certsrv.exe (58428,D,17) The database engine detected multiple threads illegally using the same database session to perform database operations. 
    SessionId: 0x0000019EF38E2620 
    Session-context: 0x00000000 
    Session-context ThreadId: 0x0000000000000000 
    Current ThreadId: 0x00000000000068E8 
    Session-trace: 

Eventually errors result in a broken ADCS service, and manual service restart was required.

The problem could be simply solved: I just didn't used CertPolicyBase.CertServer provided by abstact class and instead created and initialized new instance of CertServerModule in every VerifyRequest function call, without any sharing between threads.


problem interpretation

For a reason described above, current approach (of providing access to the property CertPolicyBase.CertServer for a developer) looks error prone.

I believe, that the project ADCS-SID-Extension-Policy-Module has the same problems with race conditions, which will only reveal itself under high load, because the CertServer is used in a couple of places at Policy.cs file.

I believe, additionally race conditions may result in:

  • double-free at CertServerModule::FinalizeContext() (e.g. at policy module)
  • memory leak at CertServer.InitializeContext(Context); (e.g. at base.VerifyRequest) in case when two threads initialize CertServer, but only memory allocated by second thread will be freed later.
    (unfortunately, this InitializeContext seems to be unavoidable if the developer wish to use for example microsoft's native policy module)

proposition

I propose to completely remove CertServer property from abstract class CertPolicyBase. If the developer requires the object to read or edit properties of a certificate, he must instantiate new object in each VerifyRequest thread.

@Crypt32
Copy link
Collaborator

Crypt32 commented Jul 4, 2024

I deliberately provided CertServer protected property instead of forcing implementers to instantiate it every time for several reasons:

  1. reduce chances that implementer mess with appropriate server module type, i.e. when you call CertServerModule.CreateExit in policy module and CertServerModule.CreatePolicy in exit module.
  2. Down the road, CertServerModule creates some COM objects and method definitions extractions into actions and funcs (https://github.com/PKISolutions/ADCS-CertMod/blob/master/ADCS.CertMod.Managed/CertServerExitPolicyManaged.cs) which will negatively affect performance and add heavy memory management (heap/COM allocations/releases) under heavy load. In addition, CertServer property will have to implement IDisposable and cause memory leaks if object created inside VerifyRequest is not properly disposed. For simplicity purposes, I created this object as singleton with same lifetime as CA process.

I did test this module under some load, but didn't encounter race conditions. Maybe my tests weren't sufficient. But anyways, even in current implementation it is relatively easy to address race condition issue by using locks:

public class MyPolicyModule : CertPolicyBase {
    readonly Object _myLock = new();
    // the rest is omitted for brevity

    public override PolicyModuleAction VerifyRequest(String strConfig, Int32 Context, Int32 bNewRequest, Int32 Flags) {
        lock (_myLock) {
            // implement your logic
            // it may be worth to copy parameters into local variables as soon as possible.
        }
    }
    // the rest is omitted for brevity
}

Of course, this will slightly slow things down, because multiple threads will stay in queue while awaiting lock release and execute one-by-one (synchronously). I think that this approach should reduce race condition chances. What do you think?

@phonexicum
Copy link
Contributor Author

But, according to your proposal, as a developer, I am now placed between two sacrificial choices:

  • I have to implement the locks and make my policy module "partially" single-threaded;
  • I am not implementing any locks, run my code multi-threaded and hope I will never hit any race-condition error on CA again;

For sure, ignoring errors is not my choice.
But do I have an opportunity to develop race-condition -free multi-threaded policy plugin? - It looks like "no", because:

  • The CertPolicyBase::VerifyRequest as its first action performs CertServer.InitializeContext(Context); at CertPolicyBase.cs
    • It means that to make my plugin race-condition safe, I must protect the base.VerifyRequest call using locks.
      • But, Microsoft's native module executes significant time. And guarding it using locks makes the whole policy module single-threaded for most of its part.

Thus, we hit the "chicken and egg" problem:

  • You have provided the CertServer interface for a developer to increase the module's performance under heavy load (which will negatively affect performance and add heavy memory management (heap/COM allocations/releases) under heavy load)
  • But under heavy load the race-condition problem shows out;
  • And to fix the race-condition, we have to make the code "effectively" single-threaded, at least around the base.VerifyRequest call and around some part of the currently developed policy module plugin;

From the considerations described previously, I see two consequences:

  1. The interface CertServer is currently prone to race-condition issue.
    However, the interface is useful according to the reasons you have described.
    To me the conclusion is simple: the CertServer should become safe. Perhaps, it can be achieved by some thread synchronization technique, for example, locking the object in CertServerModule::InitializeContext and releasing it in CertServerModule::FinalizeContext.
    Or maybe you will find some other thread synchronization solution more appropriate here.

  2. Just like I explained, currently I must execute base.VerifyRequest under lock to make myself thread-safe, although I don't use CertServer in my code and the original Microsoft's native module doesn't use CertServer object either.
    I could write the safe multi-threaded code if I only could execute funcVerifyRequest.Invoke(strConfig, Context, bNewRequest, Flags); (from here) without messing up with CertServer (from here). However the property funcVerifyRequest is "private" and not available in a derived class.

    I also understand that backward compatibility is very important for projects like this, therefore can some new interface (similar to base.VerifyRequest) be provided to make it possible to completely avoid CertServer initialization and usage to write multi-threaded and safe code?

@Crypt32
Copy link
Collaborator

Crypt32 commented Jul 9, 2024

Look, Microsoft CA's throughput is much higher than your policy module's throughput, because CA is COM server (native code), your policy module is .NET runtime (just wrapped in CCW). .NET runtime is always slower than native code, this is not debatable. Not to mention that there is additional overhead in marshalling data between COM and .NET runtime (in both directions), which slows things even more. So the answer to your question:

But do I have an opportunity to develop race-condition -free multi-threaded policy plugin?

is "no", unfortunately and it is not because how I designed this framework.

Under high load, .NET runtime is a bottleneck and you may run out of memory if high load is sustained for reasonable period, because CA will make calls faster than your ability to process. This is a classic problem. If you are looking for a policy module which performance would be comparable to CA performance, you would write it using native code (C++). Then you can have fully multi-threaded and thread safe policy/exit module. .NET is a RAD platform that allows you to implement your arbitrary logic quickly. However this convenience comes at cost: performance. I put all reasonable efforts to make the code as performant as possible, by keeping the balance with maintainability and easiness/usefulness. When I hit such performance problem, I prefer to slow things down rather than allowing uncontrolled parallel thread creation and crashing entire server when it runs out of memory.


  1. Or maybe you will find some other thread synchronization solution more appropriate here.

thread synchronization turns your code (in critical sections) into single-threaded code by definition. It doesn't matter much how exactly you implement this synchronization. Though, it may be possible to make some kind of controlled thread pool. Say, have 100 threads (with singleton CertServer instance in each) in the pool and allow to run up to 100 threads in parallel (each thread will have its own context) and block new thread creation when pool is exhausted until some threads are released. This should boost throughput to some extent. The question then, what is the "optimal" number of parallel threads and how to test this. As you may already noticed, it is not easy to make policy module testing (both, logic and performance).

2. Just like I explained, currently I must execute base.VerifyRequest under lock to make myself thread-safe, although I don't use CertServer in my code and the original Microsoft's native module doesn't use CertServer object either.
I could write the safe multi-threaded code if I only could execute funcVerifyRequest.Invoke(strConfig, Context, bNewRequest, Flags); (from here) without messing up with CertServer (from here). However the property funcVerifyRequest is "private" and not available in a derived class.

You are not required to use protected CertServer object. Inside your VerifyRequest override, you can create your own instance of CertServerModule by calling CertServerModule.CreatePolicy or use anything else at your choice. You take all the responsibility to ensure thread safety in your VerifyRequest override then.

@phonexicum
Copy link
Contributor Author

phonexicum commented Jul 15, 2024

You are not required to use protected CertServer object. Inside your VerifyRequest override, you can create your own instance of CertServerModule by calling CertServerModule.CreatePolicy or use anything else at your choice. You take all the responsibility to ensure thread safety in your VerifyRequest override then.

That is the first thing I have done, after I hit errors on ADCS due to the problem of race-condition.
But it is not enough to just "not use CertServer".
If I want to call the native microsoft's policy module, I am forced to write the following code to make my implementation of VerifyRequest thread-safe:

public class MyPolicyModule : CertPolicyBase {
    readonly Object _myLock = new();
    // the rest is omitted for brevity

    public override PolicyModuleAction VerifyRequest(String strConfig, Int32 Context, Int32 bNewRequest, Int32 Flags) {
        lock (_myLock) {
            PolicyModuleAction nativeResult = base.VerifyRequest(strConfig, Context, bNewRequest, Flags);
            // base.VerifyRequest contains: "CertServer.InitializeContext(Context);" which is NOT thread-safe
            CertServer.FinalizeContext();
        }
        // ...
        // my logic goes here, and it is thread-safe
        // ...
    }
    // the rest is omitted for brevity
}

Currently I see no way to call microsoft's native policy module using ADCS-CertMod framework without using locks.
And it results in a partially single-threaded policy module.

@Crypt32
Copy link
Collaborator

Crypt32 commented Jul 15, 2024

Currently I see no way to call microsoft's native policy module using ADCS-CertMod framework without using locks.

As described above, it is not recommended to have individual and separate instance of CertServerModule because of performance and memory overhead under heavy load which may lead to server crash, which is not a desirable outcome. The only viable solution I see is either, make VerifyRequest completely single-threaded, or introduce a pool of CertServerModule objects which will make the method thread-safe to certain extent and then turn VerifyRequest single-threaded if pool is exhausted. For example, Microsoft CLM policy module uses a pool of CertServerPolicy objects:
image
and then in VerifyRequest they do:
image
and this is what I'm suggesting. Though it isn't something possible without introducing breaking changes. The idea is that VerifyRequest will no longer be virtual and separate abstract method will be introduced. Let's say this:

// not virtual anymore
public PolicyModuleAction VerifyRequest(String strConfig, Int32 Context, Int32 bNewRequest, Int32 Flags) {
    CertServerModule certServer = pool.GetNext();
    CertServer.InitializeContext(Context);
    PolicyModuleAction nativeResult = funcVerifyRequest.Invoke(strConfig, Context, bNewRequest, Flags);
    try {
        return VerifyRequest(certServer, nativeResult);
    } finally {
        certServer.FinalizeContext();
        pool.Release(certServer);
    }
}

// can duplicate original parameters if it makes sense
protected abstract PolicyModuleAction VerifyRequest(CertServerModule certServer, PolicyModuleAction nativeResult);

And then your implementation of abstract VerifyRequest will be thread-safe by default and resources are released automatically, no need to do it yourself.

@phonexicum
Copy link
Contributor Author

Directly calling funcVerifyRequest.Invoke(strConfig, Context, bNewRequest, Flags); indeed will be a sufficient workaround for me to completely avoid usage of CertServer.
However, the funcVerifyRequest property is private (not protected), and therefore not available to me.

@Crypt32
Copy link
Collaborator

Crypt32 commented Jul 16, 2024

I have no plans to expose this func to inheritors, this is to ensure that this method is not misused. In addition, I want to ensure that this func is always called before running your own code.

@phonexicum
Copy link
Contributor Author

Can a new interface be introduced into the CertPolicyBase class?
To provide a developer the ability to call funcVerifyRequest.Invoke in a multi-threaded way without touching thread-unsafe CertServer property?

For example:

    public PolicyModuleAction VerifyRequest2(String strConfig, Int32 Context, Int32 bNewRequest, Int32 Flags) {
        return funcVerifyRequest.Invoke(strConfig, Context, bNewRequest, Flags);
    }

Thus a developer will be able to call base.VerifyRequest2(strConfig, Context, bNewRequest, Flags) without the need to make the code partially single-threaded to make it thread-safe:

public class MyPolicyModule : CertPolicyBase {
    // readonly Object _myLock = new(); // NOT NEEDED any more
    // the rest is omitted for brevity

    public override PolicyModuleAction VerifyRequest(String strConfig, Int32 Context, Int32 bNewRequest, Int32 Flags) {
        // lock (_myLock) { // NOT NEEDED any more
            PolicyModuleAction nativeResult = base.VerifyRequest2(strConfig, Context, bNewRequest, Flags);
            // the call is thread-safe, because thread-unsafe `CertServer` is not used
            // CertServer.FinalizeContext();
        // }
        // ...
        // my logic goes here, and it is thread-safe
        // ...
    }
    // the rest is omitted for brevity
}

@phonexicum phonexicum linked a pull request Aug 18, 2024 that will close this issue
@phonexicum
Copy link
Contributor Author

I have made the draft PR for my last proposition.

@Crypt32 Crypt32 linked a pull request Dec 5, 2024 that will close this issue
@Crypt32 Crypt32 linked a pull request Dec 5, 2024 that will close this issue
@Crypt32
Copy link
Collaborator

Crypt32 commented Dec 5, 2024

@phonexicum sorry for late response. I've thought about a couple of solutions for this issue and still the best one is the approach provided in this comment. Here is a PR that implements this: #7

I'm still not convinced that exposing private delegate (verifyRequest) alone is a good idea, because more often than not you still need to access CertServerModule to read request details. Updated PR automatically handle CertServerModule.Finalizecontext() call, so you are no longer required to acquire or release context, it is already done for you. In addition, a call to verifyRequest private function is done for you and its result is then passed to abstract method you implement.

Please let me know what you think about this implementation. p.s. I acknowledge that your use case doesn't necessary require CertServerModule, but I would say that it is corner case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants