Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot process hljs code with spans #97

Closed
panhaoyu opened this issue Oct 31, 2023 · 2 comments · Fixed by #104
Closed

Cannot process hljs code with spans #97

panhaoyu opened this issue Oct 31, 2023 · 2 comments · Fixed by #104

Comments

@panhaoyu
Copy link

panhaoyu commented Oct 31, 2023

def test_markdownify():
    result = markdownify.markdownify('''<code><span>merge_md_files</span></code>''')
    assert result == r'`merge\_md\_files`'

Here's a typical hljs rendered html format.

The code contains wrong escapes, and I hope to get merge_md_files.

It is because that markdownify only check the parent name in process_text:

    def process_text(self, el):
        text = six.text_type(el) or ''

        # dont remove any whitespace when handling pre or code in pre
        if not (el.parent.name == 'pre'
                or (el.parent.name == 'code'
                    and el.parent.parent.name == 'pre')):
            text = whitespace_re.sub(' ', text)

        if el.parent.name != 'code' and el.parent.name != 'pre':
            text = self.escape(text)

        # remove trailing whitespaces if any of the following condition is true:
        # - current text node is the last node in li
        # - current text node is followed by an embedded list
        if (el.parent.name == 'li'
                and (not el.next_sibling
                     or el.next_sibling.name in ['ul', 'ol'])):
            text = text.rstrip()

        return text

I monkey patched the code like:

def process_text(self: markdownify.MarkdownConverter, el):
    text = six.text_type(el) or ''

    # dont remove any whitespace when handling pre or code in pre
    if not (el.parent.name == 'pre'
            or (el.parent.name == 'code'
                and el.parent.parent.name == 'pre')):
        text = markdownify.whitespace_re.sub(' ', text)

    cursor = el
    is_code = False
    while cursor.name != '[document]':
        if cursor.name in ('code', 'pre'):
            is_code = True
        cursor = cursor.parent
    if not is_code:
        text = self.escape(text)

    # remove trailing whitespaces if any of the following condition is true:
    # - current text node is the last node in li
    # - current text node is followed by an embedded list
    if (el.parent.name == 'li'
            and (not el.next_sibling
                 or el.next_sibling.name in ['ul', 'ol'])):
        text = text.rstrip()

    return text


markdownify.MarkdownConverter.process_text = process_text

Now it works well.

And if my suggestion is useful, I can post a PR.

@panhaoyu
Copy link
Author

BTW is this project actively maintained?
Seems that the previous commit is about 1 year ago.
Any other replacement, please?

@chrispy-snps
Copy link
Collaborator

@panhaoyu - issue #101 is the same issue as this.

I would like to reach an active maintainer of this project, as there are several fixes I would like to contribute (and more tests I would like to add).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants