Today I was wondering about converting a pdf made from scan of a book
into djvu, hopefully to reduce the size, without too much loss of
quality. My initial experiments with
pdf2djvu were a bit
discouraging, so I invested some time building
gsdjvu in order to be able
to run djvudigital
.
Watching the messages from djvudigital
I realized that the reason it
was achieving so much better compression was that it was using black
and white for the foreground layer by default. I also figured out that
the default 300dpi looks crappy since my source document is apparently
600dpi.
I then went back an compared djvudigital
to pdf2djvu
a bit more
carefully. My not-very-scientific conclusions:
- monochrome at higher resolution is better than coloured foreground
- higher resolution and (a little) lossy beats lower resolution
- at the same resolution,
djvudigital
gives nicer output, but at the same bit rate, comparable results are achievable withpdf2djvu
.
Perhaps most compellingly, the output from pdf2djvu
has sensible
metadata and is searchable in evince. Even with the --words option,
the output from djvudigital is not. This is possibly related to the
error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building gsdjvu
involved guessing at corrections for several errors.
comparing
GS_VERSION
to 900 doesn't work well, whenGS_VERSION
is a 5 digit number.GS_REVISION
seems to be what's wanted there.extra declaration of struct timeval deleted
-lz added to command to build mkromfs
Some of these issues have to do with building software from 2009 (the
instructions suggestion building with ghostscript 8.64) in a modern
toolchain; others I'm not sure. There was an upload of gsdjvu
in
February of 2015, somewhat to my surprise. AT&T has more or less
crippled the project by licensing it under the CPL, which means
binaries are not distributable, hence motivation to fix all the rough
edges is minimal.
Version | kilobytes per page | position in figure |
---|---|---|
Original PDF | 80.9 | top |
pdf2djvu --dpi=450 | 92.0 | not shown |
pdf2djvu --monochrome --dpi=450 | 27.5 | second from top |
pdf2djvu --monochrome --dpi=600 --loss-level=50 | 21.3 | second from bottom |
djvudigital --dpi=450 | 29.4 | bottom |