feat: improve NTLM fallback#627
feat: improve NTLM fallback#627Pavlo Myroniuk (TheBestTvarynka) wants to merge 24 commits intomasterfrom
Conversation
|
Benoît Cortier (@CBenoit) I know you are a busy guy, but if possible it would be great to prioritize integrating these changes and into a nuget release. This is a big problem for me right now and I want to ship 😰 |
There was a problem hiding this comment.
Pull request overview
This PR extends the Negotiate (SPNEGO) client to perform an “optimistic” Kerberos preflight and fall back to NTLM for a defined set of Kerberos failure kinds, with new client/server tests validating the negotiated protocol and token reuse behavior.
Changes:
- Add optimistic Kerberos token generation in Negotiate and NTLM fallback for specific Kerberos
ErrorKinds. - Track/reuse the first Kerberos token (when applicable) and expose test-only hooks/constants under
__test-data. - Expand Kerberos client/server tests with context validators and explicit fallback test cases.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/sspi/client_server/mod.rs | Tightens test gating to require network_client + __test-data. |
| tests/sspi/client_server/kerberos/network_client.rs | Adds a failing NetworkClient mock to drive fallback tests. |
| tests/sspi/client_server/kerberos/mod.rs | Refactors SPNEGO test harness, adds validators, and introduces NTLM fallback test cases. |
| tests/sspi/client_server/kerberos/context_validator.rs | New step-based validation helpers for negotiated protocol/token reuse assertions. |
| src/utils.rs | Gates write_error_chain behind network_client; adjusts Kerberos error mapping for WRONG_REALM. |
| src/negotiate/mod.rs | Adds first_kdc_token, mech-type-driven package disabling, and an NTLM fallback helper. |
| src/negotiate/client.rs | Implements optimistic Kerberos attempt + NTLM fallback list and token reuse on the second call. |
| src/lib.rs | Re-exports fallback error kinds under __test-data for integration tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Got you! Richard Markiewicz (@thenextman)
I’ll actively review today, and hopefully this should be merged by the time you wake up tomorrow morning!
Pavlo Myroniuk (@TheBestTvarynka) The direction seems very good to me. I’m double checking along the FreeRDP sources you linked; useful stuff!
There was a problem hiding this comment.
I am really no expert about CredSSP, so I’ll mainly act as a talking rubber duck to help you catch small issues (as usual 😂).
Ideally, it would be great to add an inline comment to document rationale at any place I comment on, so there is no room for interpretation, and I understand this is an important piece of code.
General remark: the fallback decision in FreeRDP seems to be local to Negotiate, not implemented by broadening generic Kerberos error mapping. In FreeRDP, the first valid mechanism is tried optimistically, and if it errors, it is marked unusable for this exchange and Negotiate continues with the next one:
On our side, the global remap is in src/utils.rs, the map_keb_error_code_to_sspi_error() funciton, where KDC_ERR_WRONG_REALM is changed to ErrorKind::NoAuthenticatingAuthority.
I feel the FreeRDP approach is cleaner and easier to maintain that ours here.
suggestion: Keep the NTLM downgrade decision in Negotiate rather than encoding more of it in generic Kerberos error mapping. (It’s hard to follow the flow.)
Related to the above, the fallback policy implemented in this PR seems to be narrower than FreeRDP’s. I pointed that in a comment below.
| /// If the Kerberos context returns an Error, then we check for `error_type` and see if we can fallback to NTLM. | ||
| /// We fallback to NTLM in following cases: | ||
| /// - [ErrorKind::TimeSkew]: The time skew on KDC and client machines is too big. | ||
| /// - [ErrorKind::NoAuthenticatingAuthority]: The Kerberos returns this error type when there is a problem with network client. | ||
| /// For example, the network client cannot connect to the KDC, or the KDC proxy returns an error. | ||
| /// Also, the `KDC_ERR_WRONG_REALM` Kerberos error code is mapped to this error type, so if the client is misconfigured | ||
| /// and tries to get a TGT for a wrong realm, we will fallback to NTLM as well. | ||
| /// - [ErrorKind::CertificateUnknown]: The KDC proxy certificate is invalid. | ||
| /// - [ErrorKind::UnknownCredentials]: The Kerberos client returns this error kind when KDC replies with `KDC_ERR_S_PRINCIPAL_UNKNOWN` error code. | ||
| pub(crate) const NTLM_FALLBACK_ERROR_KINDS: [ErrorKind; 4] = [ | ||
| ErrorKind::TimeSkew, | ||
| ErrorKind::NoAuthenticatingAuthority, | ||
| ErrorKind::CertificateUnknown, | ||
| ErrorKind::UnknownCredentials, | ||
| ]; |
There was a problem hiding this comment.
question: As I mentioned in the main comment, this fallback policy is narrower than FreeRDP as far as I can see. This may be a valid policy choice, but maybe we should document a bit more clearly the rationale for it as it’s not the same as FreeRDP’s more local/mechanism-level fallback behavior. Any diverging behavior may be treated as a bug if not documented.
Reference:
Please, decide explicitly whether we want true FreeRDP-like fallback semantics or this narrower allowlist.
src/negotiate/mod.rs
Outdated
| /// On the first `initialize_security_context` call we call the Kerberos `initialize_security_context` to get the initial token. | ||
| /// This is an advanced mechanism for protocol negotiation. For example, if KDC does not work properly or time skew is too big, | ||
| /// then we fallback to NTLM and continue the NLA. | ||
| /// But if the first token is generated successfully (e.g. Kerberos works properly), then we save this token and reuse it on | ||
| /// the second `initialize_security_context` call. | ||
| first_kdc_token: Option<Vec<u8>>, |
There was a problem hiding this comment.
note: I’m double checking this and will follow up with another comment later.
| #[test] | ||
| fn spnego_kerberos_dce_style() { |
There was a problem hiding this comment.
note: It seems we are missing a #[cfg(test)] mod tests module in here
|
I double-checked the optimistic token flow, and I think this is the main remaining piece that is materially different from FreeRDP. In FreeRDP, the optimistic Kerberos token is sent directly in the first SPNEGO message. In our code, we first generate that token, save it in One thing that made this clearer to me is that we actually have both SPNEGO fields on our side too: I’m sure this still interoperates correctly, but it is not especially close to the FreeRDP flow. I think it also explains why the U2U path is slightly more awkward: in our code, the first-message Because of the above, I think we should either:
About testing: I think this matters because the current PR weakens the U2U coverage. We used to validate the real U2U request shape ("[p]reviously, we were able to validate the presence of the additional ticket and enc-tkt-in-skey flag"), and now we mostly validate cached-token state instead. If we move closer to the FreeRDP-style flow, then I think we should restore the stronger U2U request validation. If we keep the current replay-based design, then we should add stronger tests for that exact design instead, so we still have confidence in the tricky paths. Is it something we could do, or that’s tricky? Let me know if that makes sense, and what you think we should do here! If you go through all the comments and address anything you think is worth addressing, I’m confident we can merge the PR 🙂 |
|
note: The commit type should be |
|
Pavlo Myroniuk (@TheBestTvarynka) Please, ping me again when this is ready for a second pass / merge! Thank you! |
|
Yes, sure. I changed the implementation, and now we send the first Kerberos/NTLM token alongside the first SPNEGO token (as you suggested before and as FreeRDP does). I did not know we could do that, but it works well on my dev-environment. Unfortunately, this also changes the MIC behavior. In some cases, we do not need to do a |
c40ee56 to
d6fde46
Compare
|
Benoît Cortier (@CBenoit), I made many changes after the last review. Let me explain some decisions here and answer the rest of the questions.
It's always valuable feedback for me 😊 My vision often blurs when I work on a specific feature for a long time, and I may miss obvious mistakes. I always look forward to your review
It was done on purpose and was discussed in Slack with Richard Markiewicz (@thenextman). Currently, we've decided to explicitly list cases when we want to downgrade to NTLM. We may change this in the future.
I will close the created sub-issue because I rewrote the implementation, and we no longer need this feature to run tests.
I fixed this. Now we send the first optimistic NTLM/Kerberos token in the first SPNEGO message. It allowed me to remove But on the other hand, it made the implementation more complicated 😅 It turned out that in some cases, the SPNEGO MIC exchange ( I added the To make the Negotiate module implementation more generic and consistent, I moved the TGT_REQ/REP (TGT exchange needed for Kerberos U2U auth) to the Kerberos module. Fortunately, we had many tests that cover cases where MIC is required and when it is not. This is one of the cases when I am glad I wrote these tests back then. Feel free to ask any questions and make suggestions. If there's anything questioning you, it's worth clarifying or documenting in the code |
Hi,
I improved the NTLM fallback. Now, when the first Kerberos
initialize_security_contextmethod fails, we fallback to NTLM. This approach is similar to FreeRDP's approach: https://github.com/FreeRDP/FreeRDP/blob/1328f8292ae9a4fa1d4e560b5049b636995b2e91/winpr/libwinpr/sspi/Negotiate/negotiate.c#L759-L774We fallback to NTLM when Kerberos returns any of the following error codes:
ErrorKind::TimeSkew: The time skew on KDC and client machines is too big.ErrorKind::NoAuthenticatingAuthority: The Kerberos returns this error type when there is a problem with the network client. For example, the network client cannot connect to the KDC, or the KDC proxy returns an error.Also, the
KDC_ERR_WRONG_REALMKerberos error code is mapped to this error type, so if the client is misconfigured and tries to get a TGT for a wrong realm, we will fallback to NTLM as well.ErrorKind::CertificateUnknown: The KDC proxy certificate is invalid.ErrorKind::UnknownCredentials: The Kerberos client returns this error kind when KDC replies withKDC_ERR_S_PRINCIPAL_UNKNOWNerror code.I also added new tests and improved the validation steps for existing tests to ensure NTLM fallback works well.
fixes #618
cc Richard Markiewicz (@thenextman)