Lightweight Markdown Parser in Java: library or command line tool
JParsedown is a lightweight single-file library for converting Markdown to HTML format. The library is translated from Parsedown PHP library (version 1.8.0-beta-5) and preserves its features:
- One file
- No dependencies
- GitHub flavored
- Fast
- MIT licence
The library is compliant with Java 7+.
Additinoal features of JParsedown that are not (yet) available in the original Parsedown:
- Github-compatible Header IDs
- Page title detection
- Optional MD links conversion
Source file: JParsedown.java
JAR file: jparsedown-1.0.4.jar (50.8 KB)
JParsedown parsedown = new JParsedown();
System.out.println(parsedown.text("Hello _Parsedown_!")); // prints: <p>Hello <em>Parsedown</em>!</p>
You can also parse inline markdown only:
System.out.println(parsedown.line("Hello _Parsedown_!")); // prints: Hello <em>Parsedown</em>!
See Parsedown Security page.
Github automatically generates anchor IDs for each header in Markdown file to make it easier to reference individual sections and create the table of contents. JParsedown attempts to generate the same IDs, so the itra-page links in rendered HTML page still work like on Github.
For example, ## Header IDs
creates the following HTML:
<h2 id="header-ids">Header IDs</h2>
and can be referenced as follows:
[Header IDs](#header-ids)
ID generation in JParsedown follows these rules:
- The header text is converted to lower case.
- Special HTML characters like
–
are removed. - All characters other than letters, numbers, underscore, or whitespaces are removed.
- Whitespaces are replaced with dashes
-
. - ID is URL-encoded to handle Unicode letters.
- Duplicate IDs have a dash and a number appended:
header-ids
,header-ids-1
,header-ids-2
, etc.
JParsedown provides the title
string available after calling text()
method:
JParsedown parsedown = new JParsedown();
parsedown.text("# My Title\n\nMore text...");
System.out.println(parsedown.title); // prints: My Title
The string contains the best candidate for HTML page title, which is the first highest level header. For example, if the page has no level-1 header, but has several level-2 headers, the first of them will be the title.
If the page does not contain any headers, title
will be null
.
Note: The Markdown in the title is not stripped or processed.
Github documentation may have links between MD files like [see other file](file.md#anchor)
.
When converting documentation to static HTML pages, it is often desired to convert these links to respective HTML files, i.e. [see other file](file.html#anchor)
.
JParsedown provides a function setMdUrlReplacement(String)
that tells what replacement to use for .md
extensions.
For example, setMdUrlReplacement(".html")
will replace .md
in URL links with .html
.
The conversion is applied only to relative URLs, i.e. the ones that do not contain colon
:
character.
Use setMdUrlReplacement(null)
to disable conversion (default behaviour).
Benchmark results:
test file | repeat | JParsedown | Parsedown (PHP) | flexmark-java |
---|---|---|---|---|
cheatsheet.md | ×100 | 4.4 ms per item | 5.5 ms per item (×1.25) | 6.2 ms per item (×1.41) |
cheatsheet.md | ×1000 | 2.4 ms per item | 5.4 ms per item (×2.25) | 2.4 ms per item (×1.00) |
The benchmarking does not consider saving and loading times. Only text()
function is measured.
At the moment, JParsedown is not properly performance optimised. Speedup against the origial Parsedown is due to Java vs PHP performance difference. Also note how JIT really helps Java with large batches of work.
MD tool is a JParsedown-based command line tool for converting Markdown files into HTML pages.
See MD Tool Readme