Skip to content

Commit

Permalink
WIP: Parse tags starting with digits and containing spaces (close #1072)
Browse files Browse the repository at this point in the history
  • Loading branch information
Maarrk committed Oct 14, 2024
1 parent c1a20af commit 592fc64
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 2 deletions.
2 changes: 1 addition & 1 deletion common/markdown_parser/constants.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
export const wikiLinkRegex = /(!?\[\[)([^\]\|]+)(?:\|([^\]]+))?(\]\])/g; // [fullMatch, firstMark, url, alias, lastMark]
export const mdLinkRegex = /!?\[(?<title>[^\]]*)\]\((?<url>.+)\)/g; // [fullMatch, alias, url]
export const tagRegex =
/#[^\d\s!@#$%^&*(),.?":{}|<>\\][^\s!@#$%^&*(),.?":{}|<>\\]*/;
/#(?:(?:\d*[^\d\s!@#$%^&*(),.?":{}|<>\\][^\s!@#$%^&*(),.?":{}|<>\\]*)|(?:"[^"\n]+")|(?:<[^>\n]+>))/;
export const pWikiLinkRegex = new RegExp("^" + wikiLinkRegex.source); // Modified regex used only in parser
37 changes: 37 additions & 0 deletions common/markdown_parser/parser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,40 @@ Deno.test("Test lua directive parser", () => {
const simpleExample = `Simple \${{a=}}`;
console.log(JSON.stringify(parseMarkdown(simpleExample), null, 2));
});

const hashtagSample = `
Hashtags, e.g. #mytag but ignore in code \`#mytag\`.
They can contain slashes like #level/beginner, single quotes, and dashes: #Mike's-idea.
Can be just #a single letter.
But no other #interpunction: #exclamation! #question?
There are two ways to make #"tag with spaces and <angle>" #<tag with spaces and "quote">
These cannot span #"multiple
lines"
#no#spacing also works.
Hashtags can start with number if there's something after it: #3dprint #15-52_Trip-to-NYC.
But magazine issue #123 is not a hashtag.
Should support other languages, like #żółć or #井号
`;

Deno.test("Test hashtag parser", () => {
const tree = parseMarkdown(hashtagSample);
const hashtags = collectNodesOfType(tree, "Hashtag");
assertEquals(hashtags.length, 15);

assertEquals(hashtags[0].children![0].text, "#mytag");
assertEquals(hashtags[1].children![0].text, "#level/beginner");
assertEquals(hashtags[2].children![0].text, "#Mike's-idea");
assertEquals(hashtags[2].children![0].text, "#a");
assertEquals(hashtags[3].children![0].text, "#interpunction");
assertEquals(hashtags[4].children![0].text, "#exclamation");
assertEquals(hashtags[5].children![0].text, "#question");
assertEquals(hashtags[6].children![0].text, '#"tag with spaces and <angle>"');
assertEquals(hashtags[7].children![0].text, '#<tag with spaces and "quote">');
// multiple lines not allowed
assertEquals(hashtags[8].children![0].text, "#no");
assertEquals(hashtags[9].children![0].text, "#spacing");
assertEquals(hashtags[10].children![0].text, "#3dprint");
assertEquals(hashtags[11].children![0].text, "#15-52_Trip-to-NYC");
assertEquals(hashtags[12].children![0].text, "#żółć");
assertEquals(hashtags[13].children![0].text, "#井号");
});
2 changes: 1 addition & 1 deletion website/Markdown/Extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ In addition to supporting [[Markdown/Basics|markdown basics]] as standardized by
* [[Transclusions]] syntax
* [[Markdown/Anchors]]
* [[Markdown/Admonitions]]
* Hashtags, e.g. `#mytag`.
* [[Markdown/Hashtags]]
* [[Markdown/Command links]] syntax
* [Tables](https://www.markdownguide.org/extended-syntax/#tables)
* [Task lists](https://www.markdownguide.org/extended-syntax/#task-lists)
Expand Down
16 changes: 16 additions & 0 deletions website/Markdown/Hashtags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#level/beginner

These can be used in text to assign an [[Objects#tag]] #like-this. If hashtags are the only content of first paragraph, they are applied to the entire page.

Hashtags can contain letters, dashes, underscores and other characters, but not:
- Whitespace (space, newline etc.)
- Characters from this list `!@#$%^&*(),.?":{}|<>\`
- Consist of digits only #123 (but #3dprint is recognised)

If you need your tags to contain these characters, you have to surround the tag content with either:
- Double quotes `#"tag in quotes"` #“tag in quotes”
- Angle brackets #<tag in angle> #<tag in angle>

```query
tag where page = @page.name
```

0 comments on commit 592fc64

Please sign in to comment.