-
Summary of your issueWhen extracting a PDF, the first character of the last column is cut off. All other columns are fine. PDF only has horizontal lines to determine each row of data. Rows have line breaks. No Vertical lines to separate rows. Example: Tried 'Lattice = True' as well. Made the results worse. Check list before submit
If not possible to execute
What did you do when you faced the problem?I checked the FAQs. Tried limiting the page number to 1 page. Same results. Code:import tabula Expected behavior:See above Actual behavior:See above Related Issues:n/a |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Thanks for reporting. I saw similar stuff when the vertical line was too close to the character. One thing I come up with is to use the |
Beta Was this translation helpful? Give feedback.
-
hi @chezou, |
Beta Was this translation helpful? Give feedback.
-
Ugh, pressed the wrong button sorry @chezou. My follow-up question is if my syntax is correct. Using "columns=[10.1, 20.2, 30.3]" worked, but for future PDFs, I want to fully understand how to use the option. I know it's supposed to be: X coordinates of column boundaries. Can you please explain or suggest a better explanation from the documentation? Thanks! |
Beta Was this translation helpful? Give feedback.
-
The description comes from tabula-java's one. I'm not sure what your point is. Example code can be found in this article by @tdpetrou https://www.dunderdata.com/blog/read-trapped-tables-within-pdfs-as-pandas-dataframes If you think you want to set the same columns option between different PDFs, that is not possible. You need to set the columns option per table. |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
The description comes from tabula-java's one. I'm not sure what your point is.
https://github.com/tabulapdf/tabula-java#commandline-usage-examples
Example code can be found in this article by @tdpetrou https://www.dunderdata.com/blog/read-trapped-tables-within-pdfs-as-pandas-dataframes
If you think you want to set the same columns option between different PDFs, that is not possible. You need to set the columns option per table.