Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execution time is too much #53

Closed
anuragnagardeveloper opened this issue Dec 30, 2019 · 12 comments
Closed

execution time is too much #53

anuragnagardeveloper opened this issue Dec 30, 2019 · 12 comments
Labels

Comments

@anuragnagardeveloper
Copy link

The command runs for around 40 seconds in order to convert 10 lines of docx file. Can't we minimize the execution time?

@gimsieke
Copy link
Contributor

Can you send us the docx file? letexml at le-tex.de

@gimsieke
Copy link
Contributor

gimsieke commented Dec 30, 2019

Some acceleration can be achieved by avoiding JVM startup time. But fundamentally, in order to be able to debug the process, we split the conversion into several transformation passes where the content will be gradually converted to the intermediate, DocBook-based XML format and from there to LaTeX, again in multiple passes. This creates some overhead since most of the time, the document won’t be changed much in each pass.
If your document contains MathType equations, execution time will grow, too.
I don’t think that 40 s are typical for a docx file with little content in it, therefore we are willing to investigate if you send us the file in question.

@anuragnagardeveloper
Copy link
Author

anuragnagardeveloper commented Dec 30, 2019

Hi @gimsieke

Thanks for the quick reply
4.docx

This is the file which took around 60 seconds to convert

@gimsieke
Copy link
Contributor

About 15 seconds of execution time stem from converting the MathType equations.
I’m afraid there is not much we can do about these times short to mid-term.
At least the times don’t scale linearly with document size. I just converted a dissertation with 180 pages and 950 MathType equations. The overall time was 17 minutes, of which the MathType conversion took 12 minutes. This is still less than a second per MathType equation and less than 2 seconds per page for the rest of the process.
It is possible that in a future version we might try to run the MathType conversion in parallel threads and to reduce some overhead in invoking a single MathType conversion, but this is not a priority.
Generally I’d recommend that you use OMML (native docx) equations instead of MathType. (We can provide a conversion service from MathType to OMML if there’s general demand.)
And as I said, the times look more favorable if you don’t extrapolate from small documents since there is always some overhead that will be diluted by larger content sizes.
If you convert larger documents, you must be prepared to increase the heap space for Java. There is currently no option for increasing the heap size but @mkraetke might eventually provide one in the d2t script.

@anuragnagardeveloper
Copy link
Author

Hey @gimsieke
I understand the stuff. Can we have a small workaround by providing the prefix before every equation and replace the latex of equation with the image of the equation. This will reduce the time drastically I guess.

@gimsieke
Copy link
Contributor

I think if you select -m no, the equation images will be used. But they will probably be .wmf files, which is not useful outside of docx files (they can’t be converted standalone since they don’t embed the document’s fonts). Maybe there is a MathType option to export all equation images?

@anuragnagardeveloper
Copy link
Author

d2t.bat c:\sample-file.docx -m no

this command not working.

@gimsieke
Copy link
Contributor

This is like in #52: Only the Bash script d2t accepts these options. If you want to invoke it via d2t.bat, you need to add MTEFSOURCE=no manually in the batch file, after docx2tex.xpl.

@anuragnagardeveloper
Copy link
Author

It is still giving the latex instead of images. I have put MTEFSOURCE=no in the batch file, after docx2tex.xpl and run the command d2t.bat c:\sample-file.docx

@mkraetke
Copy link
Member

Do you talk about the Windows Batch (d2t.bat) or the Bash file (d2t)? In d2t the only permitted values are ole+wmf, ole, wmf and yes. Any other value is treated as yes. We can add the value no here and just include the fallback wmf graphic reference. Currently this issue has no priority to us, but we're open for pull requests.

@gimsieke
Copy link
Contributor

I have put MTEFSOURCE=no in the batch file,

Ah, sorry, I think it’s mtef-source=no, see https://github.com/transpect/docx2tex/blob/master/d2t#L199

@gimsieke
Copy link
Contributor

gimsieke commented Dec 31, 2019

@mkraetke The value no for mtef-source will passed to docx2hub’s mathtype2mml option, and there we document that no is indeed supported.
And I advised @anuragnagardeveloper in another comment above to directly add the option to their local copy of d2t.bat if they need to use the Windows batch file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants