I recently bought a Nook. Originally I did not expect much, and consider only to read ebooks in .epub format. To my surprise, it also reads PDF. It displays PDF in a different way than what people imagine - it retrieves the text and graph, and reflows the paragraphs. I am impressed. It has some flaws, especially in dealing with tables, and program code, but good enough for me to read PDFs with heavy text.
After I loaded several PDF files into Nook, I found that some books are shown “Unknown Author” and strange titles in “My Documents” directory, which lists all ebooks. The title and author information are taken from the PDF metadata, and obviously the software some publishers used did not set them properly.
For example, the book “the Facebook Era” written by Clara Shih, shows 013715318X.pdf as title and Unknown Author as Author.
J:\My Documents>pdfleo -i facebook.pdf
Morovia (R) pdfleo Server 32-bit Version 1.0 (build 4162)
Created: 03/24/2009 03:49:24 PM
Modified: 02/02/2010 08:20:45 PM
Application: Acrobat: pictwpstops filter 1.0
PDF Producer: PDFKit.NET 188.8.131.52
PDF Version: 1.6 (Acrobat 7.x)
Number of Pages: 254
Tagged PDF: No
Page Size: 6.01x9.25 in
The software used the book’s ISBN as title, and set empty string to Author (If the field does not exist in the file, PDFLeol displays N/A). It is more obvious when you look inside the file:
Metadata fields can also be viewed in Adobe Reader:
After pinpointing the problem, now it comes to fix. We need to set metadata to the PDF file so that the title and author display correctly in Nook (or Kindle, SONY reader etc.). Fortunately, PDFLeo is capable of modifying metdata with –info-dict switch.
J:\My Documents>pdfleo --info-dict="Title=The Facebook Era;Author=Clara Shih" facebook.pdf facebook.new.pdf
Now the new file has the metadata corrected, which can be verified by loading the file in Adobe Reader:
After overwriting the existing file, Nook shows the title and author correctly.
Although I can fix one by one, I certainly hope that publishers can get the things correct at the first step. Here is my call - publishers, check your metadata before shipping the PDF version! You can use the PDFLeo tool for that purpose.