Tag: pdfleo

I recently bought a Nook. Originally I did not expect much, and consider only to read ebooks in .epub format. To my surprise, it also reads PDF. It displays PDF in a different way than what people imagine - it retrieves the text and graph, and reflows the paragraphs. I am impressed. It has some flaws, especially in dealing with tables,  and program code, but good enough for me to read PDFs with heavy text.

After I loaded several PDF files into Nook, I found that some books are shown “Unknown Author” and strange titles in “My Documents” directory, which lists all ebooks. The title and author information are taken from the PDF metadata, and obviously the software some publishers used did not set them properly.

Nook Unknown Author

Nook Document List

For example, the book “the Facebook Era” written by Clara Shih, shows 013715318X.pdf as title and Unknown Author as Author.

With the help of PDFLeo, released by rockpdf.com, I am able to determine the cause: incorrect metadata. The following is the output:

J:\My Documents>pdfleo -i facebook.pdf
Morovia (R) pdfleo Server 32-bit Version 1.0 (build 4162)
File: facebook.pdf
Title: 013715318X.pdf
Author:
Subject:
Keywords:
Created: 03/24/2009 03:49:24 PM
Modified: 02/02/2010 08:20:45 PM
Application: Acrobat: pictwpstops filter 1.0
PDF Producer: PDFKit.NET 2.0.28.0
PDF Version: 1.6 (Acrobat 7.x)
Number of Pages: 254
Tagged PDF: No
Linearized: No
Page Size: 6.01x9.25 in
....

The software used the book’s ISBN as title, and set empty string to Author (If the field does not exist in the file, PDFLeol displays N/A). It is more obvious when you look inside the file:Incorrect MetaData Fields

Metadata fields can also be viewed in Adobe Reader:

After pinpointing the problem, now it comes to fix. We need to set metadata to the PDF file so that the title and author display correctly in Nook (or Kindle, SONY reader etc.). Fortunately, PDFLeo is capable of modifying metdata with –info-dict switch.


J:\My Documents>pdfleo --info-dict="Title=The Facebook Era;Author=Clara Shih" facebook.pdf facebook.new.pdf

Now the new file has the metadata corrected, which can be verified by loading the file in Adobe Reader:

correct-metadata-pdf1

After overwriting the existing file, Nook shows the title and author correctly.

Although I can fix one by one, I certainly hope that publishers can get the things correct at the first step.  Here is my call - publishers, check your metadata before shipping the PDF version! You can use the PDFLeo tool for that purpose.

Morovia launches PDF products

View from Morovia Office

Our first PDF product, PDFLeo was launched today. See http://rockpdf.com for more information.  PDFLeo supports -

  • Encryption. Encrypt a PDF document with either password security or public key security. Remove PDF encryption if you are able to open the file. Retain PDF encryption and permission settings in the new PDF with other part of document modified.
  • Linearization. Optimize PDF documents to be viewed over slow connection from a capable web server.
  • Size Optimization. Reduce the file size by removing redundant contents, compressing streams and moving objects to streams.
  • Query document information, such as meta data, security, document permission and font information.
  • Insert and modify predefined or custom document information entries.
  • Insert, view and modify XMP metadata.