Cannot extract relative reference links in Markdown #1657

wks · 2025-03-19T08:44:29Z

Test case:

Inline [link1](target1.md)

Reference [link2][link2]

[link2]: target2.md

Collapsed [link3][]

[link3]: target3.md

Shortcut [link4]

[link4]: target4.md

Shortcut [link5] with full URL

[link5]: file:///path/to/target5.md

Save this as ~/junk/lychee/baz.md and process it with lychee baz.md --dump -vv, and it prints:

file:///home/wks/junk/lychee/target1.md (baz.md)
file:///path/to/target5.md (baz.md)

It successfully extracts the link to target1.md and resolved it as a relative URL starting with file:///....

But link2 to link4 failed to be extracted. Link5 points to a full URL instead of a filename, and it is extracted, too.

I think the problem is in the handling of links in the markdown parser.

// excerpt from lychee-lib/src/extract/markdown.rs

pub(crate) fn extract_markdown(input: &str, include_verbatim: bool) -> Vec<RawUri> {
// ...
                match link_type {
                    LinkType::Inline => {
                        Some(vec![RawUri {
                            text: dest_url.to_string(),
                            element: Some("a".to_string()),
                            attribute: Some("href".to_string()),
                        }])
                    }
                    LinkType::Reference |
                    LinkType::ReferenceUnknown |
                    LinkType::Collapsed|
                    LinkType::CollapsedUnknown |
                    LinkType::Shortcut |
                    LinkType::ShortcutUnknown |
                    LinkType::Autolink |
                    LinkType::Email =>
                     Some(extract_raw_uri_from_plaintext(&dest_url)),

For inline links, it simply treats dest_url as the href. But for all other kinds of links, it will invoke extract_raw_uri_from_plaintext which uses some kind of heuristics to detect URLs. So anything that doesn't look like a URL in [label]: foo_bar_baz.md are ignored.

The text was updated successfully, but these errors were encountered:

mre · 2025-03-20T20:56:23Z

Haven't looked too deeply into it, but could be a duplicate of #1574. There is #1624 if you want to give the current development branch a try.

mre added the bug Something isn't working label Mar 20, 2025

mre added the waiting-for-feedback label Mar 20, 2025

thomas-zahner added the triage label Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot extract relative reference links in Markdown #1657

Cannot extract relative reference links in Markdown #1657

wks commented Mar 19, 2025

mre commented Mar 20, 2025

Cannot extract relative reference links in Markdown #1657

Cannot extract relative reference links in Markdown #1657

Comments

wks commented Mar 19, 2025

mre commented Mar 20, 2025