The Temptation of the Reader

I was an early adopter of the e-reader technology. As a heavy business traveler, and an even heavier reader to fill the gaps of time on said business trips, I would often hit the used bookstore to pick up inexpensive tomes of SciFi to read and leave when I was on the road.

The first touchscreen reader, the Sony PRS 700
The first touchscreen reader, the Sony PRS 700

The E-Reader ended that wasteful practice. My first reader was a Sony PRS-700, the first reader with a touchscreen, and I never looked back. About the same time that I bought that, Amazon introduced their Kindle.

Fast forward past a stolen Sony, a replaced Sony, and finally caving in and buying a 2013 edition of the Kindle Paperwhite reader. I have switched my allegiance to Amazon (grudgingly, I will admit that they have created a far better experience and ecosystem than Sony or others could), and read it every day.

Read moreThe Temptation of the Reader

eBook Evolution

I have written about ebooks a few times in the past. I started in 2008 with the Sony E-Reader, and then moved on to an iPad in 2011, and then to the Kindle in 2014. As a lifelong, heavy reader, books have always been a significant part of my life. The eBook and reader has been a godsend. Yet, all is not perfect in the reader world.

Being a long time e-reader user, an early adopter, and several technology nodes along the way, the challenge is that I have books from multiple vendors, in multiple formats, and that complicates life.

Sony was the first stop in the path to an e-reader. It started using proprietary Sony only formats. Yet, as the technology evolved, and Amazon become a powerful player, Sony books ended up being in protected ePub format. Moving them was trivial using Calibre, and they remain in my library.

I later bought a second generation iPad, about the time that Apple launched their bookstore. I have to admit, that the reading experience on the iPad with the Apple application was/is outstanding. However, the protection that Apple uses for their books is not removable, so you are limited to using the iPad or now the iBooks application on the Mac to read them. That would be OK if I always used my iPad to read, but alas, I prefer to use an e-ink reader (no distractions, a better immersive environment.)

All was well until the second Sony reader began to die. Its battery always sucked, and I ended up replacing it less than 18 months after buying it. However, even with the new battery, it really never lived up to the quality or performance of the original reader I had from Sony. Bummer.

I could have turned 100% to the iPad, but at its core, I still prefer the e-ink based readers. However, at this time, late 2013, the battle was over. There were some also rans, the iPad, or the Kindle.

So I took the plunge, and bought a Paperwhite kindle (wifi only, without the ads). As much as it pains me, it is now a damn good reader, and the Amazon book ecosystem is solid. Huge selection, reasonable prices, and a painless purchase/access process. It really just works.

Of course, the Amazon format files are protected (again, it is trivial to remove this protection).

The integration with Calibre is excellent, and converting my extensive collection of ePub books to .mobi format for the Kindle is trivial.

One thing is for certain, the only loser here is printed books. It has to be a special book indeed where I buy a dead tree version.

So, like much of my digital life, I have many epochs of detritus, collections spanning multiple technologies. Don’t get me started about my music collection (Amazon, Apple, Google Play, and my ripped CD’s).

eBook Fun – Fixing fouled up books

As I have mentioned many times, I have been a long time satisfied user of my reader and ebooks. Certainly better than hauling around a lot of dead trees when I travel.

All good. I have been building a collection for more than 5 years now, from a variety of sources, many commercial, but also many of the free sources (Project Gutenberg) as well as some other sources for out of print books that are ahem less than legit.

Most of the commercial options are DRM encumbered, so that I can’t peek inside with impunity. But all the others are open books, so to speak, mostly ePub format. There are some great tools to work with.

Sigil – a WYSIWYG ePub Editor

Sigil is free, open source, and pretty solid. It will help you put together a book, and fix minor errors.

It is a good place to start to figure out the ePub format.

ePub are pretty straightforward HTML with some special attributes. You can do just about anything that you can put on a web page (within reason, no javascript or animations).

But you can tweak up the look and feel of the book with stylesheets, inserted graphical elements, and all the other tricks that you can use with web pages.

Calibre – An open source library manager

Of course, your reader probably comes with software to manage its files, You will find that it is pretty limited. Perhaps you have some old files in one of the dead or dying formats (.lit, .lrf, BBeB etc.) Additionally there are a lot of eBooks in plain text format or Microsoft Word format.

It is helpful to be able to shift formats, and to clean up some of the glitches.

Enter Calibre. An open source, multi platform (Mac, Windows, Linux) environment for managing your library. It groks all the standard formats, and converts between them seamlessly. It is extensible with plugins, and it can help you clean up books as well as transcode them. Additionally, it connects with several sources to get covers, meta data, and other tangibles to improve the user experience.

It can be used to take HTML files or word processing files (RTF or .DOCX) and turn them into eBooks in any format.

Being a powerful package, to get the most out of it, you really need to understand what it is doing, and how to optimize the settings. By default it does an OK job, but as in many cases, garbage in equals garbage out.

Some issues

Why is this a problem? Well, it is because a lot of the free or community books are poorly formatted to begin with. Also, some sources in general suck. Often, I will find an out of print book that was scanned and OCR’d. Often this is turned into a MS word file. Until recently, you needed to save that file as an HTML file and run it through Calibre.

Calibre uses some pretty heavy stylesheets, that mostly look OK. The ambitious person can customize them easily, if you know what you are doing. Of course not every reader can handle all styesheet formats, so it can be a trial and error process.

Of course, there are some things that really foul up any book. Anything output by Microsoft Word uses a class structure that is insane. If you see class=”msonormalxx”, you know that you are going to have an ugly book.

RTF files are not much better. They typically have a lot less funky classes that are tossed in, but the conversion does glitch in some spectacular ways.

ePub versus other formats

I have a pretty large colletion of the Microsoft ebook format (.lit) and the old Sony reader format (.lrf) that I convert to read. Both these formats can be problematic.

The Sony format leads to ePubs with some really whacky xhtml coding in them. Really ugly to try to clean up. Additionally, they have odd chapter breaks, and pretty non functional Tables of Content.

Fortunately, it isn’t too difficult to clean them up, but it is time consuming. You need a few tools.

  1. An HTML stripper. There are several options, but I use a simple app for my Mac HTML Stripper A reasonably priced utility. There are some free ones, but I like to support small vendors, and $15 is a good price for this tool.
  2. The HTML stripper will give you good plain text. You will need to reformat that into clean HTML. Fortunately, Markdown is a fabulous way to do this. I use Mou for the Mac (free, but do donate to them), and MarkdownPad on my PC. Again free, but the pro version has some nice extensions, so it might be worth spending the $15 to buy it (I have).

The clean up workflow

First I extract the raw HTML. I do this chapter by chapter. It is best to create an ePub with one source file per chapter. That makes for clean chapter breaks, and a well functioning table of contents.

Then I run it through my HTML stripper. That gives me clean text file. It will likely have odd numbers of breaks in paragraphs, and some other interesting things. Fortunately that doesn’t matter.

I then import that text into my markdown editor. Add a chapter title in h1 and then you have a nice complete chapter to drop back into the epub. (every markdown editor has a “copy to HTML” function. Works great.)

Lastly, I build a new epub using Sigil. Add meta data, a cover, and construct a table of contents, and you have a nice book.

But what if you want to read it on your Kindle?

Of course, the Amazon kindle doesn’t support the ePub format. So you need to convert it into either an .AZW3 or a .mobi format file.

Calibre to the rescue again. Trivial, and the defaults are pretty good for conversion.

And naturally, you use Calibre to transfer or manage your library on the Kindle (this is only for files you didn’t buy from Amazon). Works like a charm.

Coda

I got into cleaning up ebooks when my collection of old Doc Savage books. Circa 2008 I found a repository of them in Sony format (I had a PRS 700 reader then), and the 181 original Doc Savage stories were a joy to read.

But they convert poorly into ePub. When I lost my PRS700, and replaced it with the PRS 600, the support for .lrf files was removed. My only options were to convert them. Calibre converted them, but it did a lousy job.

The last few days, I have been using the workflow above to clean some of these books. It takes me about 35 mintues to create a crisp, clean, and standards compliant ePub from a completely ugly converted ePub.

A labor of love.

Having a new Kindle is giving me the motivation to fix some my my titles.