Skip to content

Commit 398e362

Browse files
committed
add .markdownlint.json and fix few issues
1 parent a274471 commit 398e362

File tree

2 files changed

+30
-22
lines changed

2 files changed

+30
-22
lines changed

.markdownlint.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"MD012": false,
3+
"MD013": false,
4+
"MD022": false,
5+
"MD031": false
6+
}

README.md

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ few MBs. 💾
6161
pragmatic approach towards this especially with regards to quoting and line
6262
ends. See section [RFC-4180](#rfc-4180).
6363

64-
[Example](#Example) | [Naming and Terminology](#naming-and-terminology) | [API](#application-programming-interface-api) | [Limitations and Constraints](#limitations-and-constraints) | [Comparison Benchmarks](#comparison-benchmarks) | [Example Catalogue](#example-catalogue) | [RFC-4180](#rfc-4180) | [FAQ](#frequently-asked-questions-faq) | [Public API Reference](#public-api-reference)
64+
[Example](#example) | [Naming and Terminology](#naming-and-terminology) | [API](#application-programming-interface-api) | [Limitations and Constraints](#limitations-and-constraints) | [Comparison Benchmarks](#comparison-benchmarks) | [Example Catalogue](#example-catalogue) | [RFC-4180](#rfc-4180) | [FAQ](#frequently-asked-questions-faq) | [Public API Reference](#public-api-reference)
6565

6666
## Example
6767
```csharp
@@ -253,7 +253,7 @@ That is, to use `SepReader` follow the points below:
253253
var colNames = header.NamesStarting("GT_");
254254
var colIndices = header.IndicesOf(colNames);
255255
```
256-
1. Enumerate rows. One row at a time.
256+
1. Enumerate rows. One row at a time.
257257
1. Access a column by name or index. Or access multiple columns with names and
258258
indices. `Sep` internally handles pooled allocation and reuse of arrays for
259259
multiple columns.
@@ -398,7 +398,7 @@ If you are hovering over `row` then this will show something like:
398398
```
399399
2:[5..9] = "B;\"Apple\r\nBanana\r\nOrange\r\nPear\""
400400
```
401-
This has the format shown below.
401+
This has the format shown below.
402402
```
403403
<ROWINDEX>:[<LINENUMBERRANGE>] = "<ROW>"
404404
```
@@ -553,7 +553,7 @@ CollectionAssert.AreEqual(expected, actual);
553553
This means you are still parsing the double (which is magnitudes slower than
554554
getting just the key) for all rows. Imagine if this was an array of floating
555555
points or similar. Not only would you then be parsing a lot of values you would
556-
also be allocated 99x arrays that aren't used after filtering with `Where`.
556+
also be allocated 99x arrays that aren't used after filtering with `Where`.
557557

558558
Instead, you should focus on how to express the enumeration in a way that is
559559
both efficient and easy to read. For example, the above could be rewritten as:
@@ -709,7 +709,7 @@ That is, to use `SepWriter` follow the points below:
709709
1. Use `Set` to set the column value either as a `ReadOnlySpan<char>`, `string`
710710
or via an interpolated string. Or use `Format<T>` where `T : IFormattable`
711711
to format `T` to the column value.
712-
1. Row is written when `Dispose` is called on the row.
712+
1. Row is written when `Dispose` is called on the row.
713713
> Note this is to allow a row to be defined flexibly with both column
714714
> removal, moves and renames in the future. This is not yet supported.
715715

@@ -738,10 +738,10 @@ public bool WriteHeader { get; init; } = true;
738738
Sep is designed to be minimal and fast. As such, it has some limitations and
739739
constraints, since these are not needed for the initial intended usage:
740740

741-
* Automatic escaping and unescaping quotes is not supported. Use
741+
* Automatic escaping and unescaping quotes is not supported. Use
742742
[`Trim`](https://learn.microsoft.com/en-us/dotnet/api/system.memoryextensions.trim)
743743
extension method to remove surrounding quotes, for example.
744-
* Comments `#` are not directly supported. You can skip a row by:
744+
* Comments `#` are not directly supported. You can skip a row by:
745745
```csharp
746746
foreach (var row in reader)
747747
{
@@ -753,28 +753,28 @@ constraints, since these are not needed for the initial intended usage:
753753
}
754754
```
755755
This does not allow skipping a header row starting with `#` though.
756-
* `SepWriter` is not yet fully featured and one cannot skip writing a header
756+
* `SepWriter` is not yet fully featured and one cannot skip writing a header
757757
currently.
758758

759759
## Comparison Benchmarks
760760
To investigate the performance of Sep it is compared to:
761761

762-
* [CsvHelper](https://github.com/JoshClose/csvhelper) - *the* most commonly
762+
* [CsvHelper](https://github.com/JoshClose/csvhelper) - *the* most commonly
763763
used CSV library with a staggering
764764
![downloads](https://img.shields.io/nuget/dt/csvhelper) downloads on NuGet. Fully
765765
featured and battle tested.
766-
* [Sylvan](https://github.com/MarkPflug/Sylvan) - is well-known and has
766+
* [Sylvan](https://github.com/MarkPflug/Sylvan) - is well-known and has
767767
previously been shown to be [the fastest CSV libraries for
768768
parsing](https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers)
769769
(Sep changes that 😉).
770-
* `ReadLine`/`WriteLine` - basic naive implementations that read line by line
770+
* `ReadLine`/`WriteLine` - basic naive implementations that read line by line
771771
and split on separator. While writing columns, separators and line endings
772772
directly. Does not handle quotes or similar correctly.
773773

774774
All benchmarks are run from/to memory either with:
775775

776-
* `StringReader` or `StreamReader + MemoryStream`
777-
* `StringWriter` or `StreamWriter + MemoryStream`
776+
* `StringReader` or `StreamReader + MemoryStream`
777+
* `StringWriter` or `StreamWriter + MemoryStream`
778778

779779
This to avoid confounding factors from reading from or writing to disk.
780780

@@ -807,6 +807,7 @@ than that. Or how many *times* more bytes are allocated in `Alloc Ratio`.
807807
808808
### Runtime and Platforms
809809
The following runtime is used for benchmarking:
810+
810811
* `NET 8.0.X`
811812

812813
The following platforms are used for benchmarking:
@@ -830,25 +831,25 @@ The following platforms are used for benchmarking:
830831
### Reader Comparison Benchmarks
831832
The following reader scenarios are benchmarked:
832833

833-
* [NCsvPerf](https://github.com/joelverhagen/NCsvPerf) from [The fastest CSV
834+
* [NCsvPerf](https://github.com/joelverhagen/NCsvPerf) from [The fastest CSV
834835
parser in
835836
.NET](https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers)
836-
* [**Floats**](#floats-reader-comparison-benchmarks) as for example in machine learning.
837+
* [**Floats**](#floats-reader-comparison-benchmarks) as for example in machine learning.
837838

838839
Details for each can be found in the following. However, for each of these 3
839840
different scopes are benchmarked to better assertain the low-level performance
840841
of each library and approach and what parts of the parsing consume the most
841842
time:
842843

843-
* **Row** - for this scope only the row is enumerated. That is, for Sep all
844+
* **Row** - for this scope only the row is enumerated. That is, for Sep all
844845
that is done is:
845846
```csharp
846847
foreach (var row in reader) { }
847848
```
848849
this should capture parsing both row and columns but without accessing these.
849850
Note that some libraries (like Sylvan) will defer work for columns to when
850851
these are accessed.
851-
* **Cols** - for this scope all rows and all columns are enumerated. If
852+
* **Cols** - for this scope all rows and all columns are enumerated. If
852853
possible columns are accessed as spans, if not as strings, which then might
853854
mean a string has to be allocated. That is, for Sep this is:
854855
```csharp
@@ -859,8 +860,8 @@ time:
859860
var span = row[i].Span;
860861
}
861862
}
862-
```
863-
* **XYZ** - finally the full scope is performed which is specific to each of
863+
```
864+
* **XYZ** - finally the full scope is performed which is specific to each of
864865
the scenarios.
865866

866867
Additionally, as Sep supports multi-threaded parsing via `ParallelEnumerate`
@@ -1094,7 +1095,7 @@ With `ParallelEnumerate` and server GC Sep is **>4x faster than Sylvan and up to
10941095
`NCsvPerf` does not examine performance in the face of quotes in the csv. This
10951096
is relevant since some libraries like Sylvan will revert to a slower (not SIMD
10961097
vectorized) parsing code path if it encounters quotes. Sep was designed to
1097-
always use SIMD vectorization no matter what.
1098+
always use SIMD vectorization no matter what.
10981099

10991100
Since there are two extra `char`s to handle per column, it does have a
11001101
significant impact on performance, no matter what though. This is expected when
@@ -1312,7 +1313,7 @@ efficient `ParallelEnumerate` is, but bear in mind that this is for the case of
13121313
repeated micro-benchmark runs.
13131314

13141315
It is a testament to how good the .NET and the .NET GC is that the ReadLine is
1315-
pretty good compared to CsvHelper regardless of allocating a lot of strings.
1316+
pretty good compared to CsvHelper regardless of allocating a lot of strings.
13161317

13171318
##### AMD.Ryzen.9.5950X - FloatsReader Benchmark Results (Sep 0.4.6.0, Sylvan 1.3.7.0, CsvHelper 31.0.2.15)
13181319

@@ -1531,7 +1532,8 @@ Ask questions on GitHub and this section will be expanded. :)
15311532
### SepWriter FAQ
15321533

15331534
## Links
1534-
* [Publishing a NuGet package using GitHub and GitHub Actions](https://www.meziantou.net/publishing-a-nuget-package-following-best-practices-using-github.htm)
1535+
1536+
* [Publishing a NuGet package using GitHub and GitHub Actions](https://www.meziantou.net/publishing-a-nuget-package-following-best-practices-using-github.htm)
15351537

15361538
## Public API Reference
15371539
```csharp

0 commit comments

Comments
 (0)