Performance optimizations for EscapingUtilities#13426
Performance optimizations for EscapingUtilities#13426DustinCampbell wants to merge 17 commits intomainfrom
Conversation
- Convert Fact-based tests to Theory with InlineData - Replace xUnit assertions with Shouldly - Remove empty XML doc comments and #nullable disable - Use expression-bodied test methods
Cover UnescapeAll, Escape, EscapeWithCaching, ContainsEscapedWildcards, and round-trip scenarios with MemoryDiagnoser to establish a performance baseline before optimization.
Many thanks to @JeremyKuhne for providing this BufferScope<T> implementation.
- Make public-facing methods `public` (was `internal`) - Enable nullable reference types; add `[return: NotNullIfNotNull]` to `UnescapeAll` and `Escape` - Use file-scoped namespace - Merge `Escape` + `EscapeWithCaching` into `Escape(string? value, bool cache = false)` - Inline `AppendEscapedChar` and `AppendEscapedString` helpers into `Escape` - Rename field `s_unescapedToEscapedStrings` → `s_escapedStringCache` - Rename parameters to `value` consistently across all public methods - Rename local variables for clarity (`percentIndex`, `startIndex`, `endIndex`, `hi`/`lo`, `specialCharIndex`, etc.) - Use collection expression for `s_charsToEscape`; target-typed `new` for cache dictionary - Use expression bodies for simple members (`HexDigitChar`, `Escape`, etc.) - Use relational pattern matching in `TryDecodeHexDigit`; list patterns in `ContainsEscapedWildcards` - Use `do...while` in `UnescapeAll` since `percentIndex` is already known non-negative on entry - Replace `ch / 0x10` with `ch >> 4` in nibble extraction - Clean up XML doc comments throughout - Remove stale and redundant inline comments
- Address unnecessary allocation when a string contains a '%' but no valid escape sequence. - Add test for edge case when a string contains a '%' but no valid escape sequence and 'trim' is set to true. In that case, the trimmed string should still be returned.
- Add System.Buffers.SearchValues<char> field initialized from s_charsToEscape (#if NET) - Extract IndexOfAnyEscapeChar helper to abstract the platform difference, keeping Escape itself free of #if guards - Fall back to char[] / IndexOfAny on .NET Framework
- All chars in s_charsToEscape fall within the ASCII range ['$' (0x24) .. '@' (0x40)] - Encode membership as a 29-bit uint bitmask indexed by (c - '$') - Replace O(n×k) IndexOfAny array scan with an O(n) range check + bit test per char - Eliminates managed-to-native transition overhead on each IndexOfAnyEscapeChar call - No change on .NET (SearchValues path unchanged)
- Capture the result of the initial IndexOfAnyEscapeChar(value) fast-path check rather than discarding it and repeating the same scan at the start of the loop - Switch from while(true)/break to do...while since specialCharIndex >= 0 is established before entering the loop, eliminating a redundant branch per iteration
Collect all special char positions in a first pass using RefArrayBuilder<int>, compute the exact output length, then write directly into a freshly-allocated string with no intermediate buffer:
- On .NET: string.Create writes directly into the output string without an intermediate buffer
- On .NET Framework: new string('\0', length) + Buffer.MemoryCopy via unsafe fixed pointers for fast, native-speed chunk copies
- Extract cache operations into TryGetFromCache and AddToCache helpers
- Remove StringBuilderCache dependency from Escape entirely
- Compute trim bounds (startIndex/endIndex) before searching for '%' so the initial scan and inner-loop scan skip leading/trailing whitespace - Scope the inner-loop IndexOf to [percentIndex+1, endIndex) to match - Extract GetDefaultResult static local to deduplicate the no-escape- sequences return path (used in both the early-exit and sb-is-null cases)
- Add HexConverter with a 256-entry hex digit table (ReadOnlySpan<byte> on .NET; static byte[] on Framework — the Framework JIT can't elide bounds checks on ReadOnlySpan<byte> like the .NET Core JIT can) - TryDecodeHexDigit delegates to HexConverter.FromChar
e06a889 to
2ff888d
Compare
There was a problem hiding this comment.
Pull request overview
This pull request refactors Microsoft.Build.Shared.EscapingUtilities hot paths (Escape, UnescapeAll) and adds supporting low-allocation infrastructure (pooled buffer + ref-struct array builder) plus benchmarks and expanded unit tests to validate the new behavior and performance characteristics.
Changes:
- Rewrite
EscapingUtilities.EscapeandEscapingUtilities.UnescapeAllto reduce allocations and improve throughput (including new fast paths and cache-aware escape). - Add new low-allocation helper types (
BufferScope<T>,RefArrayBuilder<T>,HexConverter,TypeInfo<T>) plus polyfills needed for newer language/runtime features across TFMs. - Add/expand benchmarks and unit tests for escaping/unescaping and the new infrastructure.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/EscapingUtilities.cs | Rewrites escape/unescape implementations for fewer allocations and faster scanning/encoding. |
| src/Framework/Utilities/BufferScope.cs | Adds pooled-buffer ref struct to support temporary storage with minimal allocations. |
| src/Framework/Collections/RefArrayBuilder.cs | Adds ref-struct builder over BufferScope<T> for efficient index collection / array building. |
| src/Framework/Utilities/HexConverter.cs | Adds lookup-table hex decoding helper used by unescape fast path. |
| src/Framework/Utilities/TypeInfo.cs | Adds type inspection helper used to decide whether to clear pooled arrays on return. |
| src/Framework/Polyfills/UnscopedRefAttribute.cs | Adds polyfill / forwarder for UnscopedRefAttribute across target frameworks. |
| src/Framework/Polyfills/StringExtensions.cs | Adds string null/empty/whitespace extension helpers used by updated code paths. |
| src/MSBuild.Benchmarks/EscapingUtilitiesBenchmark.cs | Adds BenchmarkDotNet coverage for Escape, UnescapeAll, and related helpers. |
| src/Framework.UnitTests/EscapingUtilities_Tests.cs | Expands and modernizes escaping/unescaping tests (Shouldly + theories). |
| src/Framework.UnitTests/BufferScopeTests.cs | Adds test coverage for pooled buffer behavior, growth, pinning, and disposal. |
| src/Framework.UnitTests/RefArrayBuilder_Tests.cs | Adds extensive coverage for builder operations (add/insert/remove/range/etc.). |
| src/Framework.UnitTests/TypeInfoTests.cs | Adds coverage for TypeInfo<T>.IsReferenceOrContainsReferences() behavior and caching. |
| src/Utilities/TaskItem.cs | Updates call site to use Escape(..., cache: true) instead of EscapeWithCaching. |
| src/Shared/TaskParameter.cs | Updates call site to use Escape(..., cache: true) instead of EscapeWithCaching. |
| src/Build/BackEnd/TaskExecutionHost/TaskExecutionHost.cs | Updates call site to use Escape(..., cache: true); removes trailing whitespace. |
| src/Utilities.UnitTests/StringExtensions_Tests.cs | Disambiguates StringExtensions.Replace call site due to new Microsoft.Build.StringExtensions. |
|
Looks great. Trying to better understand the intent here. Did this path show up hot on a perf trace somewhere? |
I added some quick-and-dirty and found that these methods are called a lot. In my tests, I built a significant part of Roslyn and found 3,000,000+ invocations of these methods. I'm definitely intending to reduce the number of times these are called, but I decided to improve the performance of the methods themselves as a first step. |
Ensure that Insert(...) only takes the fast path if "_count < _scope.Length" to potentially shifting past the end of the scope. Added tests to cover this issue, which is more likely to appear with a stack-allocated buffer is used, since a rented array might actually be larger than the requested minimum length.
|
Sweet. Are these changes already noticeable end-to-end in the roslyn build (the eval part of it)? |
What metric did you mean? Obviously, this change is reducing milliseconds and lowering allocations. Many such changes in aggregate would likely show wall clock improvements. |
I missed adding string resources to SR.resx when cherry-picking RefArrayBuilder from another local branch.
Tip
I recommend reviewing this pull request commit-by-commit. I made sure that each commit is distinct, cohesive and has a detailed description.
This PR rewrites and optimizes
EscapingUtilities.EscapeandEscapingUtilities.UnescapeAll— two methods called heavily throughout MSBuild evaluation — to reduce allocations and improve throughput on both .NET and .NET Framework 4.7.2.The largest wins are in
Escape(especially the no-special-chars fast path) and strings with invalid escape sequences, with speed improvements of up to 3.3× on .NET 10.0 and 3.0× on .NET Framework 4.8.1, and theUnescapeAllallocation for strings with invalid escape sequences eliminated entirely on both runtimes.Highlights
🚀 Speed Improvements (.NET 10.0)
Escape_NoSpecialCharsEscape_FewSpecialCharsEscape_ManySpecialCharsUnescapeAll_InvalidEscapeSequencesEscapeWithCaching_FewSpecialCharsEscapeWithCaching_ManySpecialCharsRoundTrip_EscapeThenUnescape🧹 Allocation Reductions (.NET 10.0)
UnescapeAll_InvalidEscapeSequences📊 .NET Framework 4.8.1
UnescapeAll_InvalidEscapeSequencesEscape_NoSpecialCharsEscape_ManySpecialCharsEscape_FewSpecialCharsRoundTrip_EscapeThenUnescapeThe
UnescapeAll_InvalidEscapeSequencesallocation is also eliminated on .NET Framework (40 B → 0 B, 100% reduction).Detailed Summary of Changes
Infrastructure
BufferScope<T>: a new ref struct that manages a stack-allocated buffer withArrayPool<T>fallback for heap overflow, enabling low-allocation temporary storage.RefArrayBuilder<T>: a new ref struct for cheaply building arrays usingBufferScope<T>, used by the Escape two-pass algorithm.EscapingUtilities: BenchmarkDotNet benchmarks coveringEscapeandUnescapeAllacross typical MSBuild string patterns.EscapeoptimizationsSearchValues<char>on .NET: replaces theIndexOfAny(char[])scan with aSearchValues<char>-based scan, enabling vectorized SIMD search.IndexOfAnywith a bitmask scan on .NET Framework: all 9 escapable characters fall in the ASCII range['$'...'@'], so a 29-bit bitmask replaces the O(k) per-character array scan with a single range check + bit test.IndexOfAnyEscapeCharresult: avoids a redundant scan pass by feeding the initial hit directly into the collection loop.RefArrayBuilder<int>, then a second pass allocates a single exact-sized string (viastring.Createon .NET; unsafe fixed+Buffer.MemoryCopyon .NET Framework), eliminating theStringBuilderentirely.UnescapeAlloptimizationsStringBuilderlazily: theStringBuilderis only rented fromStringBuilderCachewhen the first decodable%XXsequence is actually found, so strings with%signs that aren't valid escape sequences return the original string with zero allocations.IndexOf('%')to the active trim window: when trim: true, the percent-sign search is bounded to[startIndex, endIndex)computed from the trimmed range, avoiding scanning whitespace that will be discarded.TryDecodeHexDigitarithmetic withHexConverterlookup table: delegates to the internalHexConverter.FromCharlookup table, replacing the multi-branch switch with a single table lookup.Test improvements
EscapingUtilities_Tests: migrated to[Theory]/[InlineData]withShouldlyassertions and removed redundant test helpers.UnescapeAll,UnescapeAll(trim: true),Escape, round-tripEscape↔Unescape, andContainsEscapedWildcards.EscapingUtilitiesin preparation for performance work: modernized XML docs, renamed parameters for clarity, and tightened nullability annotations ahead of the algorithmic changes.Benchmarks
Initial Results (baselines)
.NET 10.0
.NET Framework 4.8.1
Final Results
.NET 10.0
.NET Framework 4.8.1