You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When converting HTML files to DOCX using Pandoc, mathematical equations inserted in the content undergo unexpected changes:
Equations in the Middle of Paragraphs: The equations are moved to the end of the paragraph, even when they should be integrated into the text.
Equations with <mo> at the Root: When the <mo> tag is at the root of the MathML structure, the elements of the equation are rearranged, resulting in changes in the order of the components or displacements.
These behaviors compromise the semantic and visual integrity of the generated document, especially in content with mathematical formulas that need to be in the context of the text.
Steps to Reproduce
Create an HTML file containing paragraphs with embedded MathML equations.
Example:
<p>This is an equation: <math><msup><mi>x</mi><mn>2</mn></msup></math> in the middle of the text.</p><p>Another equation: <math><mo>=</mo><mn>5</mn></math>.</p>
Convert the HTML file to DOCX using the command:
pandoc -i input.html -o output.docx --mathml
Open the generated DOCX file and notice:
The equations in the middle of the paragraphs have been moved to the end.
The equations with <mo> at the root have been rearranged.
Expected Behavior
Equations should remain in the position they were inserted in the HTML.
The structure of MathML equations should be preserved in the DOCX document, without unexpected shifts or rearrangements.
Current Behavior
Equations in the middle of paragraphs are shifted to the end of the paragraph in the DOCX.
Equations containing the <mo> tag in the root are rearranged incorrectly, compromising the order of the elements.
Example HTML File
<!DOCTYPE html><htmllang="en"><head><metacharset="UTF-8"><title>Example</title></head><body><p>This is an equation: <math><msup><mi>x</mi><mn>2</mn></msup></math> in the middle of the text.</p><p>Another equation: <math><mo>=</mo><mn>5</mn></math>.</p></body></html>
Environment
Pandoc Version: 3.6.1
Additional Information
The issue seems to be related to the processing of MathML tags during conversion to DOCX.
The --mathml flag was used to maintain support for equations in MathML format.
The text was updated successfully, but these errors were encountered:
Note that --mathml only affects HTML output, so you can leave that out. pandoc -f html -o test.docx with input
<p>This is an equation: <math><msup><mi>x</mi><mn>2</mn></msup></math> in the middle of the text.</p>
<p>Another equation: <math><mo>=</mo><mn>5</mn></math>.</p>
yields a Word file that looks like this:
That looks correct to me. Are you seeing something different?
What are you using to view the docx file?
Problem Description
When converting HTML files to DOCX using Pandoc, mathematical equations inserted in the content undergo unexpected changes:
Equations in the Middle of Paragraphs: The equations are moved to the end of the paragraph, even when they should be integrated into the text.
Equations with
<mo>
at the Root: When the<mo>
tag is at the root of the MathML structure, the elements of the equation are rearranged, resulting in changes in the order of the components or displacements.The command used for the conversion is:
These behaviors compromise the semantic and visual integrity of the generated document, especially in content with mathematical formulas that need to be in the context of the text.
Steps to Reproduce
Example:
<mo>
at the root have been rearranged.Expected Behavior
Equations should remain in the position they were inserted in the HTML.
The structure of MathML equations should be preserved in the DOCX document, without unexpected shifts or rearrangements.
Current Behavior
Equations in the middle of paragraphs are shifted to the end of the paragraph in the DOCX.
Equations containing the
<mo>
tag in the root are rearranged incorrectly, compromising the order of the elements.Example HTML File
Environment
Additional Information
--mathml
flag was used to maintain support for equations in MathML format.The text was updated successfully, but these errors were encountered: