Skip to content

Full page HTML export starting to work

andrescg2sj edited this page Apr 3, 2020 · 2 revisions

Current version of PDFTableToHTML in Courseminer is working, at least for some files.

One example is this document.

It can be tested with the following command:

java -cp target/crminer-app-1.0-SNAPSHOT-jar-with-dependencies.jar org.sj.punidos.crminer.PDFTableToHTML res/CEPI-1-1.pdf -o test.htm

This exports the three tables in that PDF to a file named test.htm.

Reference

Currently, available command line arguments and options for PDFTableToHTML are:

PDFTableToHTML [OPTIONS] [PDF-filename-in]
usage: utility-name
 -c,--clip <arg>        format: x,y,width,height
 -o,--output <arg>      output file
 -p,--proximity <arg>   minimum distance between tables
 -t,--thickness <arg>   máximum line thickness

More examples

You can check out more examples of what Courseminer is able to do so far in this folder.

Folder pdf contains example documents with tables, and folder html contains files generated by ExampleGenerator.

You can run ExampleGenerator with this command:

java -cp target/crminer-app-1.0-SNAPSHOT-jar-with-dependencies.jar org.sj.punidos.crminer.ExampleGenerator
Clone this wiki locally