Converting Word docs to Kindle

What a pain finding out how to get your book onto Kindle. Nearly as painful as doing the actual reformatting. Kindle is like reading your book through a toilet-paper tube with a black-and-white filter in it. here's how I did it:

You could convert files to .epub format using Stanza, except that:

  • Word docs retain all your fancy index entries, cross references etc as hideous looking text.
  • PDFs retain all the page numbers and other page headings embedded in the text - go figure.
  • RTFs just never finish loading.....
  • You could save from Word as raw text, but that will lose all your special characters
  • Unless you save as UTF-7, but that will crash MS-Word

Eventually I loaded the Word doc into OpenOffice and saved it as HTML (don't save direct from Word! Microsoft write rubbish HTML)

Next thing I tried was to open it in STanza and save as .epub Then I manually cleaned up all the html using Sigil (which only works on html within an epub file), deleting out references to images not part of the file any more and deleting most of the CSS formats...and manually reinserted all the images one by one. Sigil is massively CPU intensive so unless you have a heavy duty PC, be prepared to wait every time you change to the book view and for goodness sake don't do an edit "undo" or ctrl-z. it also messes with the html a lot.
After all that, I eagerly defined my new book to and uploaded my .epub and got the user-friendly message
/var/tmp/dtfc/s3Get/ticket_6539921/dtp_150887_USER_CONTENT_0 inputfile is of a filetype that is unsupported, or undeterminable. inputFiletype=null
which makes a lie of Amazon's bulls**t claim to support epub format.
So then I tried to use Stanza to convert all my hard work from .epub back into Kindle or HTML or anything, but it stripped out all the images and all of the HTML tags. Are we having fun yet?
Finally I manually copied and pasted the html from Sigil into a simple .html file in a new directory.
Then I renamed the .epub to .zip and used Windows Explorer to open it and copy all the images into the same directory as my html file.

So scrap Stanza, Sigil and epub.

As I said, use OpenOffice and save it as HTML.

Don't forget to remove any references to "see page 99" etc because there aren't page numbers any more. Might as well delete the original index and table of contents too.

To edit raw html I can't recommend NVu, which is CPU-heavy and so terribly slow at times, is occasionally flaky, and messes with the HTML too much. But if you want to add an html Table of Contents automatically from the h1 and h2 etc headings, you'll need to use NVu as it is the only tool I have found so far to do it. Actually NVu is so destructive to HTML, I'd consider manually building the ToC instead.

In the end, I used ConTEXT, an excellent free editor for all sorts of file formats. No WYSIWYG, but I just used my browser for that.

Then I used 7Zip (or the zip utility of your choice) to zip all the files (HTML and images) in the directory into a .zip file and FINALLY DTP accepted it.

Some of my nicely detailed images lost all of that nice detail when reduced to Kindle size: max 64kB, max 450px x 550px, but by going back to the original Visio and re-saving to 450px wide with 600dpi resolution JPEGs with 100% image quality I got good images.

Don't let anyone tell you this crap is easy. Amazon DTP is a pain in the butt.