-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add option to scan and register Markdown anchors #39
Conversation
@tvdboom if we go with the processor approach, I'm going to reverse commit attribution (me as author, you as co-author), I hope you don't mind 🙂 |
In your examples you show both "Markdown anchor" and "HTML anchor" side by side. So I guess you expect that both syntaxes will work equivalently. Which would definitely be nice. But actually I don't see how the HTML one could work under this implementation. If a user writes raw HTML, that HTML never becomes part of the element tree, it's just stashed. |
If I were to use custom anchors for myself, ideally I'd probably prefer raw HTML over attr_list Markdown because: MkDocs itself in the upcoming release already has an implementation of exactly this |
I appreciate the attention to detail here. I wonder if users will also be able to appreciate the benefit they get from typing [](#install-with-package-manager){#arch-install-pkg}
## Install with package manager rather than the simpler [](){#arch-install-pkg}
## Install with package manager As I understand, the effect is that |
Yes, sorry, I forgot to explain that part. If you look at the changes,
|
Damn 😂 I actually didn't test that it worked with raw HTML anchors 😫
Both options (somehow using MkDocs logic, and copying code over) sound good to me. Whatever you think is best. We can maybe start by copying code over, and once MkDocs has released the feature we can start thinking about tighter integration. |
I think I'm also fine with telling users that raw HTML is not supported (for now). There is a lot of markup that won't render correctly on GitHub, so I don't care that much about graceful degradation anymore. |
I actually didn't understand this part - how does this affect the ToC? |
I compared against the MkDocs code and of course it doesn't save the I was also wondering, even just from a usage perspective, are there any downsides to making this [](){#arch-install-pkg}
## Install with package manager behave like the alias feature anyway, even though the user didn't explicitly type the associated anchor? (And then eliminate this alias-specific |
Can we can look ahead to find the following heading and somehow compute/get its slug? I guess it's easy if we iterate on the element tree. |
With the |
[](){#arch-install-pkg}
## Install with package manager I like this suggestion, so I implemented it. Support for passing the href is removed: users must use the syntax The previous implementation could have been confusing: This new implementation is stricter and less confusing: aliases must always appear right before their target heading. |
Going out of draft because it's working and tested, but we can still change a few things if we want 🙂 |
And I suppose abandon the idea of supporting raw HTML for now 🥲 but yea it's the only reasonable way |
The title and everything keeps mentioning HTML but that's not really accurate anymore, right? 🤔 |
Right, let me update everything to remove mentions of HTML. I'll also add docs. |
Ah, do we want to allow multiple aliases per heading? [](){#hello}
[](){#bonjour}
## Hello world! I believe this is feasible, we just need to record anchors in a list before registering them through the plugin. One difficulty I can see is: [](){#bonjour}
Bonjour.
[](){#hello}
## Hello world! Will need some testing. |
Done. Some docs added in the README 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice :)
tests/test_references.py
Outdated
assert plugin._url_map["alias1"] == "#heading-bar" | ||
assert plugin._url_map["alias2"] == "#heading-bar" | ||
assert plugin._url_map["alias3"] == "#alias3" | ||
assert plugin._url_map["alias4"] == "#heading-baz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have an official specification of what is expected to happen in case of conflicts, with the most extreme case being this one:
[](){#foo}
## Bar
[](){#bar}
## Foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
[](){#foo}
## Bar
...
[](){#bar}
## Foo
...
[Link to foo][foo]
[Link to bar][bar]
Erm sorry about the highlighted rectangle on the left, but you get the idea.
Clicking on the "foo" link will bring you to the "Bar" heading, and inversely.
So, even though the ids of the headings themselves have been suffixed, what the user specified has been achieved 🤷 (the aliases work)
I just noticed that anchors within docstrings are not picked up. UPDATE: the autorefs extension is correctly added back by mkdocstrings. |
OK it's simply because of the IdPrepending processor: if I add |
Interesting. Well even though the behavior is "correct", perhaps something should be done just because surely nobody wants these mangled ids for something that they wrote explicitly.. |
The aliases feature has the goal of global disambiguation. I'm sure it makes sense in docstrings as well. As for non-alias anchors, it is of course not so clear, and stopping to prefix them could be a breaking change for some users, though it seems super niche. |
2460e8a
to
5eea2e0
Compare
I have one idea for what mkdocstrings could do. Upon encountering a tag like this (in etree form), The condition to do this could be specifically an |
Also as a side note, what if autorefs were to clean up those empty |
Duplicating the anchors... such an elegant solution 🥲 Thanks a lot! Yeah we can also clean up the empty hrefs 👍 |
I think we will have to implement an iterator that yields parents and element position as well as elements, to insert |
This will be useful for the soon-to-come Markdown anchors feature of mkdocs-autorefs. Related to mkdocs-autorefs#39: mkdocstrings/autorefs#39
…autorefs' Markdown anchors PR-#651: #651 Related-to-mkdocs-autorefs#39: mkdocstrings/autorefs#39 Co-authored-by: Oleh Prypin <oleh@pryp.in>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider merging my commit where I address my suggestions.
oprypin@2fc4e3c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My latest commit + a merge commit can be merged into this branch by doing the following:
git fetch origin 37c416947b23feee011e94901c68a57d294b63de
git merge 37c416947b23feee011e94901c68a57d294b63de
Or I could push this to the PR myself if that's OK
Definitely feel free to push directly to this PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK in my view this is ready
One last thing maybe: clearing up empty hrefs on aliases and anchors (as suggested by you earlier)? |
I don't think these empty hrefs do any harm, lets merge. |
Supersedes/closes #20.
This PR adds a tree processor to our Markdown extension in order to iterate on the element tree (converted from Markdown) and register anchors that have an id. If an anchor is directly followed by a heading, then it will target this heading rather than itself.
We pass a reference to the plugin to the tree processor, so that it can obtain the current page URL and call its
register_anchor
method. The current page URL is set by the plugin, for each page, before Markdown conversion, in theon_page_markdown
event.You'll see in the changes that I left some comments: an alternative implementation is using regular expressions to match anchors in the HTML. It was working well, but one issue with that is that it's not trivial to support matching ids and hrefs in any order:<a id="hello" href="#hello">
and<a href="#hello" id="hello">
. So instead of complicating the regular expression (it might be possible, but I'm lacking regex-fu), I went with the processor approach, which could also be more efficientI'm planning to run some benchmarks to test both implementation and see which one performs the best, but maybe this is obvious to you @oprypin that the tree processor is the best approach, and in this case I won't bother benchmarking.The regex approach was dropped, as it's always difficult and never a good idea to use regular expressions on HTML. It would have been even harder to infer the slug of headings appearing right after anchors.
Now, what does this feature bring, and how do we use it?
Basically, the changes in this PR make it possible to:
The syntax in the above examples require the
attr_list
Markdown extension to be enabled.autorefs then lets us reference these headings or non-heading locations with the usual syntax:
[install from sources][install-from-sources]
and[see paragraph 2][paragraph-2]
.The alias feature is particularly interesting when you have several similar pages, with equal headings, as it allows you to keep these headings equal (anchor, permalink), while defining additional anchors which include the page name. For example:
Each page has:
You don't want to change headings and make them redundant, like
## Arch: Install with package manager
,## Debian: Install with package manager
just to be able to reference the right one with autorefs. Instead you can do this:...changing
arch
bydebian
,gentoo
, etc. in the other pages.I believe this PR also closes #25 and #35, since equal headings throughout a site do not cause determinism issues anymore: you can now define unique aliases to make sure to link to the right heading. No need to warn about equal headings, no need to add priority configuration.
Finally, with a bit more work in mkdocstrings, it could also make it possible to shorten the anchors in the permalinks, while keeping the full path of objects in autorefs. So, instead of this redundant permalink:
https://mkdocstrings.github.io/griffe/reference/griffe/git/#griffe.git.load_git
, we could havehttps://mkdocstrings.github.io/griffe/reference/griffe/git/#load_git
, while still being able to reference it with[load_git][griffe.git.load_git]
.