Skip to content

feat: improve NTLM fallback#627

Open
Pavlo Myroniuk (TheBestTvarynka) wants to merge 24 commits intomasterfrom
feat/improve-ntlm-fallback
Open

feat: improve NTLM fallback#627
Pavlo Myroniuk (TheBestTvarynka) wants to merge 24 commits intomasterfrom
feat/improve-ntlm-fallback

Conversation

@TheBestTvarynka
Copy link
Collaborator

@TheBestTvarynka Pavlo Myroniuk (TheBestTvarynka) commented Mar 16, 2026

Hi,

I improved the NTLM fallback. Now, when the first Kerberos initialize_security_context method fails, we fallback to NTLM. This approach is similar to FreeRDP's approach: https://github.com/FreeRDP/FreeRDP/blob/1328f8292ae9a4fa1d4e560b5049b636995b2e91/winpr/libwinpr/sspi/Negotiate/negotiate.c#L759-L774

We fallback to NTLM when Kerberos returns any of the following error codes:

  • ErrorKind::TimeSkew: The time skew on KDC and client machines is too big.
  • ErrorKind::NoAuthenticatingAuthority: The Kerberos returns this error type when there is a problem with the network client. For example, the network client cannot connect to the KDC, or the KDC proxy returns an error.
    Also, the KDC_ERR_WRONG_REALM Kerberos error code is mapped to this error type, so if the client is misconfigured and tries to get a TGT for a wrong realm, we will fallback to NTLM as well.
  • ErrorKind::CertificateUnknown: The KDC proxy certificate is invalid.
  • ErrorKind::UnknownCredentials: The Kerberos client returns this error kind when KDC replies with KDC_ERR_S_PRINCIPAL_UNKNOWN error code.

I also added new tests and improved the validation steps for existing tests to ensure NTLM fallback works well.

fixes #618

cc Richard Markiewicz (@thenextman)

@thenextman
Copy link
Member

Benoît Cortier (@CBenoit) I know you are a busy guy, but if possible it would be great to prioritize integrating these changes and into a nuget release. This is a big problem for me right now and I want to ship 😰

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Negotiate (SPNEGO) client to perform an “optimistic” Kerberos preflight and fall back to NTLM for a defined set of Kerberos failure kinds, with new client/server tests validating the negotiated protocol and token reuse behavior.

Changes:

  • Add optimistic Kerberos token generation in Negotiate and NTLM fallback for specific Kerberos ErrorKinds.
  • Track/reuse the first Kerberos token (when applicable) and expose test-only hooks/constants under __test-data.
  • Expand Kerberos client/server tests with context validators and explicit fallback test cases.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/sspi/client_server/mod.rs Tightens test gating to require network_client + __test-data.
tests/sspi/client_server/kerberos/network_client.rs Adds a failing NetworkClient mock to drive fallback tests.
tests/sspi/client_server/kerberos/mod.rs Refactors SPNEGO test harness, adds validators, and introduces NTLM fallback test cases.
tests/sspi/client_server/kerberos/context_validator.rs New step-based validation helpers for negotiated protocol/token reuse assertions.
src/utils.rs Gates write_error_chain behind network_client; adjusts Kerberos error mapping for WRONG_REALM.
src/negotiate/mod.rs Adds first_kdc_token, mech-type-driven package disabling, and an NTLM fallback helper.
src/negotiate/client.rs Implements optimistic Kerberos attempt + NTLM fallback list and token reuse on the second call.
src/lib.rs Re-exports fallback error kinds under __test-data for integration tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Member

@CBenoit Benoît Cortier (CBenoit) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you! Richard Markiewicz (@thenextman)
I’ll actively review today, and hopefully this should be merged by the time you wake up tomorrow morning!

Pavlo Myroniuk (@TheBestTvarynka) The direction seems very good to me. I’m double checking along the FreeRDP sources you linked; useful stuff!

Copy link
Member

@CBenoit Benoît Cortier (CBenoit) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really no expert about CredSSP, so I’ll mainly act as a talking rubber duck to help you catch small issues (as usual 😂).

Ideally, it would be great to add an inline comment to document rationale at any place I comment on, so there is no room for interpretation, and I understand this is an important piece of code.

General remark: the fallback decision in FreeRDP seems to be local to Negotiate, not implemented by broadening generic Kerberos error mapping. In FreeRDP, the first valid mechanism is tried optimistically, and if it errors, it is marked unusable for this exchange and Negotiate continues with the next one:

On our side, the global remap is in src/utils.rs, the map_keb_error_code_to_sspi_error() funciton, where KDC_ERR_WRONG_REALM is changed to ErrorKind::NoAuthenticatingAuthority.

I feel the FreeRDP approach is cleaner and easier to maintain that ours here.

suggestion: Keep the NTLM downgrade decision in Negotiate rather than encoding more of it in generic Kerberos error mapping. (It’s hard to follow the flow.)

Related to the above, the fallback policy implemented in this PR seems to be narrower than FreeRDP’s. I pointed that in a comment below.

Comment on lines +18 to +32
/// If the Kerberos context returns an Error, then we check for `error_type` and see if we can fallback to NTLM.
/// We fallback to NTLM in following cases:
/// - [ErrorKind::TimeSkew]: The time skew on KDC and client machines is too big.
/// - [ErrorKind::NoAuthenticatingAuthority]: The Kerberos returns this error type when there is a problem with network client.
/// For example, the network client cannot connect to the KDC, or the KDC proxy returns an error.
/// Also, the `KDC_ERR_WRONG_REALM` Kerberos error code is mapped to this error type, so if the client is misconfigured
/// and tries to get a TGT for a wrong realm, we will fallback to NTLM as well.
/// - [ErrorKind::CertificateUnknown]: The KDC proxy certificate is invalid.
/// - [ErrorKind::UnknownCredentials]: The Kerberos client returns this error kind when KDC replies with `KDC_ERR_S_PRINCIPAL_UNKNOWN` error code.
pub(crate) const NTLM_FALLBACK_ERROR_KINDS: [ErrorKind; 4] = [
ErrorKind::TimeSkew,
ErrorKind::NoAuthenticatingAuthority,
ErrorKind::CertificateUnknown,
ErrorKind::UnknownCredentials,
];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:  As I mentioned in the main comment, this fallback policy is narrower than FreeRDP as far as I can see. This may be a valid policy choice, but maybe we should document a bit more clearly the rationale for it as it’s not the same as FreeRDP’s more local/mechanism-level fallback behavior. Any diverging behavior may be treated as a bug if not documented.

Reference:

Please, decide explicitly whether we want true FreeRDP-like fallback semantics or this narrower allowlist.

Comment on lines +111 to +116
/// On the first `initialize_security_context` call we call the Kerberos `initialize_security_context` to get the initial token.
/// This is an advanced mechanism for protocol negotiation. For example, if KDC does not work properly or time skew is too big,
/// then we fallback to NTLM and continue the NLA.
/// But if the first token is generated successfully (e.g. Kerberos works properly), then we save this token and reuse it on
/// the second `initialize_security_context` call.
first_kdc_token: Option<Vec<u8>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I’m double checking this and will follow up with another comment later.

Comment on lines 581 to 582
#[test]
fn spnego_kerberos_dce_style() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: It seems we are missing a #[cfg(test)] mod tests module in here

@CBenoit
Copy link
Member

Benoît Cortier (CBenoit) commented Mar 17, 2026

I double-checked the optimistic token flow, and I think this is the main remaining piece that is materially different from FreeRDP.

In FreeRDP, the optimistic Kerberos token is sent directly in the first SPNEGO message. In our code, we first generate that token, save it in first_kdc_token, and then send it only on the second client step.

One thing that made this clearer to me is that we actually have both SPNEGO fields on our side too: mech_token and response_token. They are different fields for different stages of the exchange. mech_token is for the first message, and response_token is for the later messages. So this is not just a naming difference with FreeRDP. The real difference is that FreeRDP uses the first-message field for the optimistic token, while this PR replays it later through the later-message field.

I’m sure this still interoperates correctly, but it is not especially close to the FreeRDP flow. I think it also explains why the U2U path is slightly more awkward: in our code, the first-message mech_token is already being used for the U2U-specific path, so the preflight Kerberos token cannot cleanly go there too. That is why we end up needing extra logic like cloning the Kerberos context and special-casing token reuse. Also Copilot caught an edge-case that we may or may not need to care about, but still highlights the added complexity of the extra state and book-keeping: #627 (comment)

Because of the above, I think we should either:

  • move the normal Kerberos path closer to FreeRDP and put the optimistic token directly in the initial SPNEGO message, while keeping U2U as its own explicit path, or
  • keep the current design, but document clearly that it is intentionally different from FreeRDP and strengthen the tests around this replay behavior.

About testing: I think this matters because the current PR weakens the U2U coverage. We used to validate the real U2U request shape ("[p]reviously, we were able to validate the presence of the additional ticket and enc-tkt-in-skey flag"), and now we mostly validate cached-token state instead. If we move closer to the FreeRDP-style flow, then I think we should restore the stronger U2U request validation. If we keep the current replay-based design, then we should add stronger tests for that exact design instead, so we still have confidence in the tricky paths. Is it something we could do, or that’s tricky?

Let me know if that makes sense, and what you think we should do here! If you go through all the comments and address anything you think is worth addressing, I’m confident we can merge the PR 🙂

@CBenoit
Copy link
Member

Benoît Cortier (CBenoit) commented Mar 17, 2026

note: The commit type should be fix if we’re fixing a behavior. My understanding is that the behavior we have on master is borderline a bug, or is problematic to some extent.

@CBenoit
Copy link
Member

Benoît Cortier (CBenoit) commented Mar 17, 2026

Pavlo Myroniuk (@TheBestTvarynka) Please, ping me again when this is ready for a second pass / merge! Thank you!

@TheBestTvarynka
Copy link
Collaborator Author

Pavlo Myroniuk (TheBestTvarynka) commented Mar 18, 2026

Yes, sure.

I changed the implementation, and now we send the first Kerberos/NTLM token alongside the first SPNEGO token (as you suggested before and as FreeRDP does). I did not know we could do that, but it works well on my dev-environment. Unfortunately, this also changes the MIC behavior. In some cases, we do not need to do a mechListMIC exchange. Today, I am refactoring and dev-testing that. Hope it didn’t cause any issues

@TheBestTvarynka
Copy link
Collaborator Author

Pavlo Myroniuk (TheBestTvarynka) commented Mar 19, 2026

Benoît Cortier (@CBenoit), I made many changes after the last review. Let me explain some decisions here and answer the rest of the questions.

so I’ll mainly act as a talking rubber duck to help you catch small issues (as usual 😂).

It's always valuable feedback for me 😊 My vision often blurs when I work on a specific feature for a long time, and I may miss obvious mistakes. I always look forward to your review ☺️

question:  As I mentioned in the main comment, this fallback policy is narrower than FreeRDP as far as I can see. This may be a valid policy choice, but maybe we should document a bit more clearly the rationale for it as it’s not the same as FreeRDP’s more local/mechanism-level fallback behavior.

It was done on purpose and was discussed in Slack with Richard Markiewicz (@thenextman). Currently, we've decided to explicitly list cases when we want to downgrade to NTLM. We may change this in the future.

we need to update the CI configuration so it runs tests with the __test-data feature enabled

I will close the created sub-issue because I rewrote the implementation, and we no longer need this feature to run tests.

In FreeRDP, the optimistic Kerberos token is sent directly in the first SPNEGO message. In our code, we first generate that token, save it in first_kdc_token, and then send it only on the second client step.

I fixed this. Now we send the first optimistic NTLM/Kerberos token in the first SPNEGO message. It allowed me to remove negotiate.first_krb_token buffer and avoid Kerberos cloning.

But on the other hand, it made the implementation more complicated 😅 It turned out that in some cases, the SPNEGO MIC exchange (mechListMIC tokens) is optional. Even more, its behavior is more unintuitive than I expected. Based on my observations, we must (server returns error when we sent MIC token, but it does not expect it) skip MIC exchange when the client selects Kerberos without U2U and without the USE_DCE_STYLE flag. In all other cases, we need to perform the MIC exchange.

I added the mic_required flag to the Negotiate implementation to track when the MIC exchange is needed.

To make the Negotiate module implementation more generic and consistent, I moved the TGT_REQ/REP (TGT exchange needed for Kerberos U2U auth) to the Kerberos module.

Fortunately, we had many tests that cover cases where MIC is required and when it is not. This is one of the cases when I am glad I wrote these tests back then.

Feel free to ask any questions and make suggestions. If there's anything questioning you, it's worth clarifying or documenting in the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Improve NTLM fallback

5 participants