Skip to content
Gabriel Scherer edited this page Dec 24, 2020 · 10 revisions

Comparing omd to other Markdown parsing tools

The benchmark input that was used is the README of `markdown-it` (a popular Javascript markdown parser) concatenated to itself 1000 times – resulting in a markdown file of 307K lines, good feature coverage, and denser markup usage than average.

Benchmark results (time + peak memory usage):

omd 2.x marked markdown-it omd 1.x kramdown pandoc
0.73s (116Mio) 1.30s (257Mio) 1.99s (374Mio) 4.73s (330Mio) 7.97s (327Mio) 33.99s (1870Mio)

The results show that omd 2.x is competitive with state-of-the-art tools from the web community (it is faster and consumes less memory), omd 1.x was slower, and pandoc is sensibly slower. None of the implementations appear to make any effort to stream the input to reduce memory usage, which I guess is not worth it for typical short Markdown documents.

Details:

  • benchmark date: December 2020
  • Markdown-it is a popular Javascript Markdown parser, that was built to be extensible yet fast. I used the current master branch (7b8969c), run with node v10.8.0.
  • Marked is a javascript implementation that was “built for speed”. I used the current master (c8783a3c), run with node v10.8.0.
  • Pandoc is a versatile format-conversion tool. I used version 2.9.2.1. Different markdown dialects supported by Pandoc have different parsing performance, so I used the simplest dialect, markdown_strict.
  • kramdown is a popular Ruby markdown engine, used by default in Jekyll. (Different dialects are supported, I used the simplest dialect markdown. Ruby 2.7.2.)
  • The “omd 2.x” version (bc07ade77) is essentially the current master branch (December 2020).
  • The “omd 1.x” version (9de620c) is a commit after the last 1.x release, the first I found that would compile fine on my machine.).

In-process vs. Multi-process workflows

If you have a lot of average-sized Markdown files, is there a speed difference to processing all of them in a single omd invocation, or invoking a separate process for each? The question depends on your operating system. Below are ballpark numbers from December 2020, on a recent Linux machine; test1.md is the markdown-it README (307 lines, dense markup), test10.md has 10 copies of it (3070 lines), etc.

# one process per input file
$ /usr/bin/time -f "%e (%MKio)" bash -c 'for i in $(seq 1 1000); do ./omd.exe ../test1.md; done > /dev/null'
2.09 (3876Kio)

# one process for 10 input files
$ /usr/bin/time -f "%e (%MKio)" bash -c 'for i in $(seq 1 100); do ./omd.exe $(for j in $(seq 1 10); do echo ../test1.md; done); done > /dev/null'
0.71 (5644Kio)

# one process for a 10x-larger input
$ /usr/bin/time -f "%e (%MKio)" bash -c 'for i in $(seq 1 100); do ./omd.exe ../test10.md; done > /dev/null'
0.67 (6348Kio)

# 100x-larger inputs
$ /usr/bin/time -f "%e (%MKio)" bash -c 'for i in $(seq 1 10); do ./omd.exe ../test100.md; done > /dev/null'
0.69 (17316Kio)

# 1000x-larger inputs
$ /usr/bin/time -f "%e (%MKio)" bash -c 'for i in $(seq 1 1); do ./omd.exe ../test1000.md; done > /dev/null'
0.74 (118880Kio)

The results suggest that processing 307 lines of Markdown takes time comparable to the process invocation logic: launching 10 processes for 10 input files is 3x slower than launching one process on those 10 input files, or on a single file that concatenates those inputs. On the other hand, larger groupings (per 100, per 1000) do not make a difference. (Very large inputs see a small slowdown, probably due to the increased memory footprint.)