diff --git a/README.md b/README.md index f0bf6392..cdf5ddeb 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ and highly efficient implementation. * **๐Ÿ”Ž Minimal** - a succinct yet expressive API with few options and no hidden changes to input or output. What you read/write is what you get. E.g. by default there is no "automatic" escaping/unescaping of quotes. For automatic unescaping -of quotes see [SepReaderOptions](#sepreaderoptions). +of quotes see [SepReaderOptions](#sepreaderoptions) and [Unescaping](#unescaping). * **๐Ÿš€ Fast** - blazing fast with both architecture specific and cross-platform SIMD vectorized parsing incl. 64/128/256/512-bit paths e.g. AVX2, AVX-512 (.NET 8.0+), NEON. Uses [csFastFloat](https://github.com/CarlVerret/csFastFloat) for @@ -40,6 +40,8 @@ with [detailed benchmarks](#comparison-benchmarks) to prove it. * **๐Ÿ—‘๏ธ Zero allocation** - intelligent and efficient memory management allowing for zero allocations after warmup incl. supporting use cases of reading or writing arrays of values (e.g. features) easily without repeated allocations. +* **โœ… Thoroughly tested** with great code coverage and special focus on testing + edge cases including randomized [fuzz testing](https://en.wikipedia.org/wiki/Fuzzing). * **๐ŸŒ Cross-platform** - works on any platform, any architecture supported by .NET. 100% managed and written in beautiful modern C#. * **โœ‚๏ธ Trimmable and AOT/NativeAOT compatible** - no problematic reflection or @@ -300,6 +302,49 @@ public bool DisableColCountCheck { get; init; } = false; public bool Unescape { get; init; } = false; ``` +#### Unescaping +While great has been taken to ensure Sep unescaping of quotes is both correct +and fast, there is always the question of how does one respond to invalid input. + +The below table tries to summarize the behavior of Sep vs CsvHelper and Sylvan. +Note that all do the same for valid input. There are differences for how invalid +input is handled. For Sep the design choice has been based on not wanting to +throw exceptions and to use a principle that is both reasonably fast and simple. + +| Input | Valid | CsvHelper | CsvHelperยน | Sylvan | Sepยฒ | +|-|-|-|-|-|-| +| `a` | True | `a` | `a` | `a` | `a` | +| `""` | True | | | | | +| `""""` | True | `"` | `"` | `"` | `"` | +| `""""""` | True | `""` | `""` | `""` | `""` | +| `"a"` | True | `a` | `a` | `a` | `a` | +| `"a""a"` | True | `a"a` | `a"a` | `a"a` | `a"a` | +| `"a""a""a"` | True | `a"a"a` | `a"a"a` | `a"a"a` | `a"a"a` | +| `a""a` | False | EXCEPTION | `a""a` | `a""a` | `a""a` | +| `a"a"a` | False | EXCEPTION | `a"a"a` | `a"a"a` | `a"a"a` | +| `ยท""ยท` | False | EXCEPTION | `ยท""ยท` | `ยท""ยท` | `ยท""ยท` | +| `ยท"a"ยท` | False | EXCEPTION | `ยท"a"ยท` | `ยท"a"ยท` | `ยท"a"ยท` | +| `ยท""` | False | EXCEPTION | `ยท""` | `ยท""` | `ยท""` | +| `ยท"a"` | False | EXCEPTION | `ยท"a"` | `ยท"a"` | `ยท"a"` | +| `a"""a` | False | EXCEPTION | `a"""a` | `a"""a` | `a"""a` | +| `"a"a"a"` | False | EXCEPTION | `aa"a"` | `a"a"a` | `aa"a` | +| `""ยท` | False | EXCEPTION | `ยท` | `"` | `ยท` | +| `"a"ยท` | False | EXCEPTION | `aยท` | `a"` | `aยท` | +| `"a"""a` | False | EXCEPTION | `aa` | EXCEPTION | `a"a` | +| `"a"""a"` | False | EXCEPTION | `aa"` | `a"a` | `a"a"` | +| `""a"` | False | EXCEPTION | `a"` | `"a` | `a"` | +| `"a"a"` | False | EXCEPTION | `aa"` | `a"a` | `aa"` | +| `""a"a""` | False | EXCEPTION | `a"a""` | `"a"a"` | `a"a"` | +| `"""` | False | | | EXCEPTION | `"` | +| `"""""` | False | `"` | `"` | EXCEPTION | `""` | + +`ยท` (middle dot) is whitespace to make this visible + +ยน CsvHelper with `BadDataFound = null` + +ยฒ Sep with `Unescape = true` in `SepReaderOptions` + + #### SepReader Debuggability Debuggability is an important part of any library and while this is still a work in progress for Sep, `SepReader` does have a unique feature when looking at it diff --git a/src/Sep.ComparisonBenchmarks/Program.cs b/src/Sep.ComparisonBenchmarks/Program.cs index 14891779..5dfa7afb 100644 --- a/src/Sep.ComparisonBenchmarks/Program.cs +++ b/src/Sep.ComparisonBenchmarks/Program.cs @@ -33,9 +33,12 @@ log($"{Environment.Version} args: {args.Length} versions: {GetVersions()}"); -UnescapeCompare.CompareUnescape(); #if DEBUG -return; +// Consider where to move this perhaps a new ComparisonTest project +if (Debugger.IsAttached) +{ + UnescapeCompare.CompareUnescape(); +} #endif await PackageAssetsTestData.EnsurePackageAssets().ConfigureAwait(true); diff --git a/src/Sep.ComparisonBenchmarks/UnescapeCompare.cs b/src/Sep.ComparisonBenchmarks/UnescapeCompare.cs index a6dd41dd..b1422629 100644 --- a/src/Sep.ComparisonBenchmarks/UnescapeCompare.cs +++ b/src/Sep.ComparisonBenchmarks/UnescapeCompare.cs @@ -108,7 +108,7 @@ public static void CompareUnescape() sb.AppendLine(); sb.AppendLine($"ยน CsvHelper with `BadDataFound = null`"); sb.AppendLine(); - sb.AppendLine($"ยฒ Sep with `{nameof(SepReaderOptions.Unescape)} = true`"); + sb.AppendLine($"ยฒ Sep with `{nameof(SepReaderOptions.Unescape)} = true` in `{nameof(SepReaderOptions)}`"); var text = sb.ToString(); Trace.WriteLine(text);