Cannot process hljs code with spans #97

panhaoyu · 2023-10-31T14:09:34Z

def test_markdownify():
    result = markdownify.markdownify('''<code><span>merge_md_files</span></code>''')
    assert result == r'`merge\_md\_files`'

Here's a typical hljs rendered html format.

The code contains wrong escapes, and I hope to get merge_md_files.

It is because that markdownify only check the parent name in process_text:

    def process_text(self, el):
        text = six.text_type(el) or ''

        # dont remove any whitespace when handling pre or code in pre
        if not (el.parent.name == 'pre'
                or (el.parent.name == 'code'
                    and el.parent.parent.name == 'pre')):
            text = whitespace_re.sub(' ', text)

        if el.parent.name != 'code' and el.parent.name != 'pre':
            text = self.escape(text)

        # remove trailing whitespaces if any of the following condition is true:
        # - current text node is the last node in li
        # - current text node is followed by an embedded list
        if (el.parent.name == 'li'
                and (not el.next_sibling
                     or el.next_sibling.name in ['ul', 'ol'])):
            text = text.rstrip()

        return text

I monkey patched the code like:

def process_text(self: markdownify.MarkdownConverter, el):
    text = six.text_type(el) or ''

    # dont remove any whitespace when handling pre or code in pre
    if not (el.parent.name == 'pre'
            or (el.parent.name == 'code'
                and el.parent.parent.name == 'pre')):
        text = markdownify.whitespace_re.sub(' ', text)

    cursor = el
    is_code = False
    while cursor.name != '[document]':
        if cursor.name in ('code', 'pre'):
            is_code = True
        cursor = cursor.parent
    if not is_code:
        text = self.escape(text)

    # remove trailing whitespaces if any of the following condition is true:
    # - current text node is the last node in li
    # - current text node is followed by an embedded list
    if (el.parent.name == 'li'
            and (not el.next_sibling
                 or el.next_sibling.name in ['ul', 'ol'])):
        text = text.rstrip()

    return text


markdownify.MarkdownConverter.process_text = process_text

Now it works well.

And if my suggestion is useful, I can post a PR.

The text was updated successfully, but these errors were encountered:

panhaoyu · 2023-10-31T14:13:28Z

BTW is this project actively maintained?
Seems that the previous commit is about 1 year ago.
Any other replacement, please?

chrispy-snps · 2024-01-14T14:40:17Z

@panhaoyu - issue #101 is the same issue as this.

I would like to reach an active maintainer of this project, as there are several fixes I would like to contribute (and more tests I would like to add).

chrispy-snps mentioned this issue Jan 14, 2024

Underscores within <span> tags inside <pre><code> blocks are incorrectly escaped in Markdown output #101

Closed

chrispy-snps mentioned this issue Jan 15, 2024

improve text normalization/escaping for preformatted/code contexts #104

Merged

chrispy-snps linked a pull request Jan 15, 2024 that will close this issue

improve text normalization/escaping for preformatted/code contexts #104

Merged

chrispy-snps closed this as completed in #104 Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot process hljs code with spans #97

Cannot process hljs code with spans #97

panhaoyu commented Oct 31, 2023 •

edited

Loading

panhaoyu commented Oct 31, 2023

chrispy-snps commented Jan 14, 2024

Cannot process hljs code with spans #97

Cannot process hljs code with spans #97

Comments

panhaoyu commented Oct 31, 2023 • edited Loading

panhaoyu commented Oct 31, 2023

chrispy-snps commented Jan 14, 2024

panhaoyu commented Oct 31, 2023 •

edited

Loading