0.9m06 Pt 2 - FGbz 2; Table OCRs; Text Pages 2 #18
Replies: 5 comments 2 replies
-
Adding FGbz chunks using Patched DjVuLibre Win64 any2djvu Sjbz = 2999 bytes; v0.9mo6 Sjbz = 1159 bytes Coordinates were chosen using the advanced feature in DjView 4 which allows one to select a given area on a page and save it as maparea coordinates. |
Beta Was this translation helpful? Give feedback.
-
Here are two 20-page 600dpi DjVus with the same-looking content, as two sets of Colorized Tables. File Sizes: any2djvu 85.3 KB -- minidjvu-mod 55.0 KB COLORIZED_TABLES_any2djvu_minidjvu-mod0.9m06djvu_and_Script.zip In almost all instances, after horizontal-line mapareas were used to colorize the tables, it was not necessary to also include the mapareas of the vertical lines. My experience at any2djvu is that if the table lines are colorized and also some of the characters, this does not affect the segmentation of the page. The Sjbz stays the same, as if it were a black and white file. If on the other hand a solid block of color is in a table cell, then there may be some segmentation of that color, where some of the color is placed in the foreground as Sjbz, and some is placed in the background as BG44. Or it may be totally kept in the Sjbz. This is one example of where the Threshold option in the program DjVuDigital can be useful when working with PDFs in Linux, although such a program lacks shared dictionaries. |
Beta Was this translation helpful? Give feedback.
-
OCR - Any2DjVu vs Minidjvu-mod0.9m06 DjVu Table Pages - Use the pdf2djvu OCR instead |
Beta Was this translation helpful? Give feedback.
-
Basic Text Pages - Any2djvu vs MiniDjVu-mod0.9m06 |
Beta Was this translation helpful? Give feedback.
-
This is an apology to Alex. I had found earlier that the Win version of minidjvu-mod would not process much beyond 460 text-based pages at a time. I went back and checked the 903 pbms I was attempting to process, and in the 460s there are several pgms instead of pbms, although they are named with the extension .pbm. I had initially batch converted the 903 pages and should have checked the look of each converted page afterwards. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion is a fresh extension of v0.9m06 Part 1.
It begins with testing Alex's patched test binaries for DjvuLibre Win32 and Win64.
The patch is to improve upon the integration and description of FGbz chunks in minidjvu-mod.
This post to be updated.
Update: Alex's binaries and my test results:
https://github.com/trufanov-nok/minidjvu-mod/files/7363837/patched_djvumake3.5.28_x64_vs2013.zip
https://github.com/trufanov-nok/minidjvu-mod/files/7363838/patched_djvumake3.5.28_x32_vs2013.zip
For any problems with older machines, Alex suggests this link to Visual C++ Redistributable Packages for Visual Studio 2013:
https://www.microsoft.com/en-us/download/details.aspx?id=40784
Results -- Excellent!
The newly patched DjVuLibre's successfully fix the FGbz chunk problems described in Part 1 of this discussion. I have tested the x64 DjVuLibrePatched in Win 10, and have tested the x32 DjVuLibrePatched in Win 7, and the FGbz patched function for both worked perfectly: 25 bytes and 2 colors instead of 139 bytes and 27 colors for the FGbz chunk in a table test page colorized with 26 color zones of #blue. Thanks, Alex, once again.
Beta Was this translation helpful? Give feedback.
All reactions