0.9m06 Pt 1 - Tables 2; Shared Graphics; FGbz 1 #15
Replies: 12 comments 20 replies
-
Hi, Every encoder detects pieces of black on b/w page and decides if it's a letter, just dirt or graphic - basing on their size. Too small objects are removed as a trash if The problem is that in a real scanned images you'll almost never find two pieces of graphic that are pixel-to-pixel equal. And encoders do not bother themselves to detect such possible options of graphic compression. They rely on refinement encoding which applied page by page, not involving shared dictionary (which also may be a space for further compression improvement). But in case of generated images like yours, the pieces of graphics are exactly same across all the pages (while not the same in a scope of a single page, as encoder can't rotate black corners etc. I guess encoding with rotations even not a part of djvu specification). And for such pages it's feasible to try to encode graphics into the shared dictionary, keeping in mind it have to be losslessly classified nevertheless lossy compression is selected by the user. In other words, minidjvu-mod is now tweaked to achieve better compression for a very rare case: exactly the same graphic objects repeated on a several pages. Which is very unlikely happens for scans, but may be found on generated pages. |
Beta Was this translation helpful? Give feedback.
-
Thanks, Alex! |
Beta Was this translation helpful? Give feedback.
-
Unfortunately, no. Technically the object detection is performed in a regions max. 500x500. So the table is cut into such regions and then connected components is searched in them. Cutting don't take into consideration the objects structure. The pieces are encoded separately. I'm pretty sure that any2djvu has the same object detection mechanism, bcs their dictionary doesn't contain pieces bigger than 500x500. I don't know why this is done. Perhaps if I join lines into one piece I may end up with a box or corner instead of a long line and this will take more disk space to encode bcs it's area is bigger. Don't know. It's probably related to a character encoding limitations of Z-coder used in Djvu for that purpose. Btw, you may use https://github.com/jsbien/djview4shapes It's a DjView4 fork made by a polish linguists. When you open a djvu document in it, you may click on any character in document. It' will be highlighted by yellow. The right panel displays the dictionaries (I guess, Djbz, then Sjbz and all other characters). The character you highlight will be selected in this list. Also characters that are refinement-encoded based on selected character will be highlighted too. The panel on the left that usually contains the document pages and is used for navigation is turned to a filter, that displays only pages that contain same character of the dictionary. PgUp/PgDown keys are used for navigation. There is some letter usage statistics and context menu to export bitmaps from dictionary. That's a scientific project made for research and thus it's not user friendly and sometimes is buggy. |
Beta Was this translation helpful? Give feedback.
-
That's right. I guess it displays number of color zones, not different colors. And 27th is a default zone with color
How exactly you're moving FGbz chunks? Could you provide 2 djvu pages: colored and not. And instruction on how to move FGbz chunk from one to another which end up with a wrong result? |
Beta Was this translation helpful? Give feedback.
-
Hi Steven,
When I re-read the thread I saw there was a technical explanation for having really the same color multiple times. The hints I gave were for visually the same color, but slightly different in a scanned source picture, or even in the resulting DjVu when processed with DjVu Solo 3.1, which might contain a 20 color FGBZ. As my focus is on getting the same in a PDF the cost of translating a FGBZ with that many colors can take 2k on a page, so I thought of reducing it to the amount of colors that are really making the difference.
Outlook voor Android downloaden<https://aka.ms/ghei36>
…________________________________
From: maple7-7-7 ***@***.***>
Sent: Wednesday, October 13, 2021 1:33:44 AM
To: trufanov-nok/minidjvu-mod ***@***.***>
Cc: rmast ***@***.***>; Comment ***@***.***>
Subject: Re: [trufanov-nok/minidjvu-mod] Version 0.9m6 (Discussion #15)
Thanks, rmast. I know nothing about the programs you discussed or how you processed the document, but I will check out what these programs can do as a possible workaround.
Thanks again.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#15 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZPZ5QIW5573SCNGM373M3UGTAVRANCNFSM5FWXD75A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
Discussion Titles Adjusted |
Beta Was this translation helpful? Give feedback.
-
Regarding 27 colors - I've checked djvumake tool sources from DjVuLibre package. They are pretty straightforward - djvumake just writes down all color zones from the command line into colortable without checking if such color is already in the color table. Also then it checks if there is no color without zone in argument "FGbz=" string, and if so - adds black color to the end of the colortable as default zone color. I've made a patch for DjVuLibre's Linux users might |
Beta Was this translation helpful? Give feedback.
-
git apply diff
Outlook voor Android downloaden<https://aka.ms/ghei36>
…________________________________
From: Alexander Trufanov ***@***.***>
Sent: Thursday, October 14, 2021 8:03:02 PM
To: trufanov-nok/minidjvu-mod ***@***.***>
Cc: rmast ***@***.***>; Comment ***@***.***>
Subject: Re: [trufanov-nok/minidjvu-mod] v0.9m06 - Tables; Shared Graphics (Discussion #15)
Great, try this one:
1. Get build dependencies:
sudo apt install libjpeg-dev libtiff-dev imagemagick
1. Try to build the library as is
cd /tmp
git clone https://github.com/barak/djvulibre.git
cd djvulibre
autoreconf --install
./configure --prefix /usr
make
In case of configure show errors (missing library dependencies?) - show me the output.
Same in case make fails.
1. Try to build it again with patch (being in /tmp/djvulibre/):
wget https://sourceforge.net/p/djvu/discussion/103286/thread/f41cfdc5ae/b263/attachment/djvumake.diff
git appy djvumake.diff
make
1. Install patched version over the system one (req. admin privileges):
sudo make install
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#15 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAZPZ5TBQ6AGA6AQKNQ77N3UG4LNNANCNFSM5FWXD75A>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
PATCH TEST RESULTS - FGBZ So for the FGbz chunk we now have 25 bytes instead of 139 bytes! Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hi Alex, Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
I've managed to build a patched In case exe's don't launch on old PC one may try to install Visual C++ Redistributable Packages for Visual Studio 2013 |
Beta Was this translation helpful? Give feedback.
-
Hi Alex,
I have started this new discussion for the newest version of minidjvu-mod.
I have tested your new version on 20 5100x6600 Table pages at 600 dpi and with a 20-page dictionary. any2djvu vs your new minidjvu-mod0.9m6
Results are impressive! The two djvus look the same, but . . .
Any2djvu = 77.2 KB
minidjvu-mod0.9m6 = 48.6 KB (!)
Yours is about 63% the size of the any2djvu version!
This is the kind of compression we had hoped for early on in the project.
Actually even better than that!
Here is a jpg with the file sizes summary:
Left side any2djvu Right side Yours!
Here are the two DjVus as a zip file.
TABLES_any2djvu_vs_mini_0.9m6_djvus.zip
Please explain how you got such an improvement. I know you were predicting possibly better compression. But wow, you really nailed it this time!
Some of your pages are less than half the file size of corresponding any2djvu pages, and the any2djvu's are supposed to be at or close to commercial quality.
Thanks again,
Stephen
Beta Was this translation helpful? Give feedback.
All reactions