Kindle, EPUB, And Amazon’s Love Of Reinventing Wheels

Last last month, a post from the relatively obscure Good e-Reader claimed that Amazon would finally allow the Kindle to read EPUB files. The story was picked up by all the major tech sites, and for a time, there was much rejoicing. After all, it was a feature that owners have been asking for since the Kindle was first released in 2007. But rather than supporting the open eBook format, Amazon had always insisted in coming up with their own proprietary formats to use on their readers. Accordingly, many users have turned to third party programs which can reliably convert their personal libraries over to whatever Amazon format their particular Kindle is most compatible with.

Native support for EPUB would make using the Kindle a lot less of a hassle for many folks, but alas, it was not to be. It wasn’t long before the original post was updated to clarify that Amazon had simply added support for EPUB to their Send to Kindle service. Granted this is still an improvement, as it represents a relatively low-effort way to get the open format files on your personal device; but in sending the files through the service they would be converted to Amazon’s KF8/AZW3 format, the result of which may not always be what you expected. At the same time the Send to Kindle documentation noted that support for AZW and MOBI files would be removed later on this year, as the older formats weren’t compatible with all the features of the latest Kindle models.

If you think this is a lot of unnecessary confusion just to get plain-text files to display on the world’s most popular ereader, you aren’t alone. Users shouldn’t have to wade through an alphabet soup of oddball file formats when there’s already an accepted industry standard in EPUB. But given that it’s the reality when using one of Amazon’s readers, this seems a good a time as any for a brief rundown of the different ebook formats, and a look at how we got into this mess in the first place.

EPUB

The history of the EPUB format can be tracked back to 1999, with the version 1.0 release of the Open eBook Publication Structure (OEBPS). Used by some of the very first dedicated electronic readers from the likes of Sony and Intel, it essentially consisted of a manifested ZIP archive that contained pages written in a form of XHTML, with CSS used for styling. OEBPS went through several revisions over the years, and in 2007 it became the official technical standard of the International Digital Publishing Forum (IDPF). At that point it was renamed to EPUB, short for Electronic Publication.

EPUB continued to evolve over the years, and in 2016 the IDPF merged with World Wide Web Consortium (W3C) in an attempt to bring the publishing industry inline with the latest in web development. The current version of the EPUB format (3.2) was released in May of 2019, and offers features such as the ability for Internet-connected devices to load fonts and other content from outside the container file itself.

While the 3.x branch has introduced some fairly large changes in the core format to better handle multimedia content, EPUB can still ultimately still be thought of as a relatively simple web page contained in a ZIP file. As they are exceptionally easy to parse and render, you can find EPUB reader applications on even very low-end devices.

It’s also worth noting that, while the EPUB format does allow for Digital Rights Management (DRM), it is not part of the standard. That means if a vendor wants to implement DRM in EPUB, they have to figure out how to do it themselves. In theory this could lead to incompatibility issues between vendor-specific solutions, but in practice, most people who are using EPUBs are doing so specifically because they are DRM-free.

MOBI/AZW

Even older than EPUB, MOBI has its origins in the PalmDOC format from 1996. Originally conceived as a way of storing large text files on the Palm Pilot, the format offered little in the way of formatting outside the ability to mark the start and end points of paragraphs. It did however offer basic bookmarking capability, which in some cases was used to offer a rudimentary table of contents. Being that PalmDOC was a variation of the standard “Palm Database” file, it also featured the ability to store various bits of metadata in a standardized header, such as the author name, book title, and current reading position.

MobiPocket Reader on Palm OS

While suitable enough for the low-resolution displays of the early Palm Pilots, the lack of any real formatting support in PalmDOC became a liability as the hardware improved. In 2000 MobiPocket, developers of ebook reader applications on Palm, Symbian, and later BlackBerry devices, decided to take matters into their own hands and expand PalmDOC. They added an HTML-like markdown language, improved support for images, and as it was an open format, even borrowed a bit from OEBPS. Since they didn’t have the authority to call it an update to the original PalmDOC, they dubbed their creation MOBI.

The story might have stopped here if it wasn’t for the fact that in 2005, Amazon purchased MobiPocket, and in turn the rights to MOBI. But rather than use the format as-is for the Kindle, they added a new DRM scheme and cranked the format’s LZ77 compression to the maximum. As the first-gen Kindle only offered a relatively meager 250 MB of onboard storage and was limited to downloading new titles over a 3G cellular connection, they wanted to shave off as many bytes as possible.

This tweaked version of MOBI, which became the standard format for Amazon’s ebook empire, was dubbed AZW. From here on out Amazon essentially starts using AZW as a blanket term for their ebook containers, and the actual formats underneath start getting a bit blurry. In the early days, it was possible to come across other similarly named file types:

AZW1

Known officially as Topaz, this proprietary Amazon format has little relation to MOBI/AZW beyond a shared DRM scheme and similar metadata header. In addition to supporting larger images compared to the earlier formats, it was unique in that each title could include its own fonts and glyphs rather than relying on what was built into the Kindle itself. This made it well suited for old books or non-English works, as it could better retain the original text and style.

AZW2

This actually isn’t an ebook format at all, so don’t be surprised if you’ve never ran across one. Rather, this is a container file for executable Kindle applications and games.

KF8/AZW3

With the release of the first Kindle Fire tablet in 2011, Amazon needed a new format that could handle multimedia content. The answer was KF8, which is essentially a combination of EPUB and MOBI. In fact, it specifically picks up some of the EPUB 3.x features such as support for HTML5 and CSS3. New support for both fixed-layout pages and SVG images makes this format well suited for comic books, which was a big selling point for the large color display of the Kindle Fire.

Rather than maintaining two different file formats, Amazon decided to move all of their readers over to AZW3 and make it the new standard for the marketplace. While the electronic paper Kindles may not necessarily benefit from the features offered by the new format, all of them beyond the first and second generation are able to read them thanks to redundant MOBI header information which is kept specifically for backwards compatibility.

KFX/AZW8

With the release of the Kindle Paperwhite 3 in 2015, Amazon rolled out their latest format, KFX. Technical information about KFX is a bit hard to come by, as it appears Amazon developed it in-house to be their “ultimate” book format. Some of the new improvements include an enhanced typesetting engine, additional fonts, and support for JPEG XR images. It also rolls in support for video and interactivity, theoretically allowing the same format to be used for both books and software applications.

But perhaps the most obvious change was the enhanced DRM, which has caused plenty of headaches for users who wish to read Amazon purchased ebooks on other devices. At this point the format and DRM is understood well enough that it can be handled by third-party software, but it takes additional steps and intermediary tools that aren’t required for AZW3 content.

It’s generally recommended that anyone who wishes to maintain their own local library of ebook files should avoid this format altogether — though as more and more of Amazon’s library switches over, that may mean you need to purchase your books elsewhere.

Alexandria On Your Hard Drive

If all you ever do is read Amazon-purchased books on your Kindle, then you’ve probably never had to worry about any of this. To their credit, Amazon has largely perfected the experience of buying and consuming electronic books — there is, after all, a reason the Kindle has become the defacto ereader. All this technical shuffling about is hidden from view, and for the most part, you just tap the book you want to read and get on with your life.

But for those of us who want to source their books from multiple marketplaces, keep an offline copy of their purchased books, or read their Amazon books on a non-Amazon reader, things can get a bit messy. The best advice I can give you, if you’ve managed to get this far without hearing it already, is to grab a copy of Kovid Goyal’s phenomenal Calibre.

This cross-platform GPLv3 program lets you build a format-agnostic virtual library that lives on your local computer, and seamlessly performs device-specific file conversion when uploading to your reader. It might not be quite as easy as spending your days in Amazon’s walled garden, but for users who demand a bit more control over their digital content, it’s a price worth paying.

48 thoughts on “Kindle, EPUB, And Amazon’s Love Of Reinventing Wheels

  1. When I’ve got my first smartphone, Nokia E50, I used program called Mobipocket Reader, and PC software, Mobipocket Creator to create some .prc ebooks. I also used this on my next phone, HTC S740, but after switching to Android I started using FBReader, which supports ePubs, Mobi, .prc, .txt and few other formats.

    The one format not mentioned here is DAISY, which can combine text with audio creating (in theory) an audiobok that can be searched via text and navigated sentence by sentence. In practice many DAISY books are limited to chapter by chapter navigation, as they don’t contain the full text – too much work, I suppose.

  2. I don’t like Amazon, don’t own kindle and don’t want to. For me Pocketbook readers are almost perfect (earlier it was Nook with koreader but I had to find something with better resolution). When I want to buy ebook and the only option is Amazon with their kindle-only format I look for a pirate or give up. I don’t understand publishers which are not selling ebooks on their own but only through Amazon. Fortunately in my country Amazon don’t have such strong position as in US.

        1. USB C is about to become the standard, and with the ‘Apple loophole’ thoroughly quashed this time around. The USB C connector has scaled from 5Gb/s (USB 3) to 80Gb/s (DP 2.0 alt mode) without breaking a sweat, so is likely to stick around for quite some time.

          1. But can you tell by looking at it whether a given usb c port or cable supports a particular speed or feature? Having several incompatible devices and/or cables using the same plug isn’t really an improvement, imo…

          2. I’m with MoTLD USB C is perhaps the worst connector to exist in history for its massive and hidden incompatibility problems, as unlike previous USB across generation changes even!, the cable will plug in and just not work with USB C, where old generations everything could still run at USB1 speeds and spec if that is needed between devices.

    1. Its also just plain and simple creating a standard that suits their needs well – which is something that happens without DRM, open/closed source etc – its partly just the nature of being global and huge that means you have the resources and want something a little different than a defaults (which does somewhat include DRM – which much as we often like to hate it, especially when its abused, it does have some merits – like the rights holders/producers etc actually getting paid more closely to the actual in this case readership).

  3. “It’s generally recommended that anyone who wishes to maintain their own local library of ebook files should avoid this format altogether — though as more and more of Amazon’s library switches over, that may mean you need to purchase your books elsewhere.”

    I think Barns and Noble pretty much has the same problem. Even worse some content can’t be moved to external storage.

    1. To be honest, I don’t think B&N is really on anyone’s radar in terms of ebooks. I’ve had good luck getting DRM-free EPUB from Google Books in the past, plus of course there are indie ebook marketplaces out there.

    1. Ah, yes…holding and reading a real “in the flesh” book. What a strange concept to so many! While I enjoy my Kindle, the “real” books on my bookshelf are treasures I have read o and re-read often over the years. There is a sense of satisfaction in physically turning pages, “smelling” the fragrance of old paper, and being able to easily underline my favorite passages.

    2. It has some advantages, like the simple ownership model, lack of intrusive DRM, and lack of electricity requirements. But you can’t easily search a physical book, make backups, or adjust the formatting for your comfort, and if you want to have multiple you just need to carry them all around.

      1. Have you heard of these things called indices and copy machines? They’re pretty nifty. And then there is always the trusty notebook and pen for you to rely on while you’re reading.

        Ctrl-F is nice and all but it’s nothing you can’t live without. If anything it’s better for your brain if you approach things analog. I print most of the PDFs and eBooks I actually read or use. Grepping with your eyes has a clear and definite mental advantage. And the workout from lugging books around means you don’t need that gym membership so win-win!

  4. I can’t understand why people are buying locked ereaders and locked ebooks.
    I hate that I am not the owner of what I have paid for (remember the deleted books a few years ago).
    I hate that if tomorrow Amazon decides that the Kindle is obsolete I won’t be able to read my ebooks on another ereader

    1. That is simple, because its convenient – the Kindle for all its flaws is tied to a cheap and enduring book retailer, that works really effortlessly at reliveing you of money for reading matter…

      Plus how many actually viable alternatives are there? If you want to read the latest from your favoured famous authors and their great series or the stupidly expensive textbooks you are likely not going to find them (all and ‘legally’ anyway) available without getting drm somewhere along the lines.

      1. Not saying I like it, but there are few alternatives an many things in life to dislike – something that works rather well at what its for, isn’t a major ripoff just isn’t worth the effort to get hot under the collar about with so many bigger fish to fry.

      2. I bought an boox note air. It is very fast and it has android. Even the software is good and in these day you can say this very seldom! And because it is android you can install the amazon reader and use it if you like to buy books at amazon. Yes it is not the cheapest reader in the world, but it is a good one!

    2. “I hate that if tomorrow Amazon decides that the Kindle is obsolete I won’t be able to read my ebooks on another ereader”
      Google ‘Apprentice Alf’, combine that with Calibre, and all your ebooks will now be in a DRM free format that you can read anywhere.
      The same solution will also work on the DRM from other stores as well.

      1. To expand on this slightly: most users are happy to just use whatever platform they bought it on to read it, and there haven’t been that many incidents where DRM has affected standard users on big platforms yet. People complaining about abstruse technical details affecting their weird use-cases are irrelevant.

        1. “People complaining about abstruse technical details affecting their weird use-cases are irrelevant.”

          Point taken, however…

          “It’s hard to notice what isn’t there. We’re aiming to fix that, with this work of “design fiction” — a collection of devices, services, products, and tools. These things could have been, and should have been, but never were.”

          Most users don’t know what they’re missing. That is from the EFF’s “Catalog of Missing Devices,” which showcases hypothetical innovations that aren’t technically difficult, but are made impossible by the DMCA. Things like simply changing the font on your ereader, using text-to-speech, or translation, all depend on the support of the owner of the ecosystem you’ve been walled into. One reader’s weird use-cases are another’s accessibility necessities.

          How weird a use case is buying an ereader from a different company and actually being able to keep the books one payed for but from a different company? If people knew it was easily technically possible to do so but they simply aren’t allowed to, would they want to? Might we have meaningful competition among ebook and ereader sellers?

    3. I keep my kindle completely offline for just this reason. Plus, they don’t need to know what I read. When I get any books through amazon, I download them, run them through calibre, de-DRM them, and convert them to epub. Only then do I transfer them to the kindle. This way, there is always a local, unlocked copy in my own storage drive. amazon doesn’t know about anything that doesn’t come through them and they can’t decide that I can’t read something.

  5. If you are savvy enough to tinker around with your Kindle’s software (aren’t we all?), it is possible to jailbreak the Kindle and install a 3rd party ereader software such as KOReader (https://github.com/koreader/koreader). I did it on my Kindle PW3 and used it to read PDF and EPUB ebooks. The feature set is quite broad, and they offer integration to Calibre allowing remote transfer of ebooks over to the Kindle.

  6. >That means if a vendor wants to implement DRM in EPUB, they have to figure out how to do it themselves. [… ] most people who are using EPUBs are doing so specifically because they are DRM-free.

    Not really, most places selling ebooks provide downloads as epub with Adobe Digital Editions DRM. That’s the de-facto standard, and is supported by e.g. Kobo. No one I know of made their own DRM format, and only indie books are sold DRM free. When I buy one, I’ll of course strip the DRM with Calibre DeDRM.

    1. people need to remember that BOOKS ARE DRM-FREE,
      with no way possible to add it.

      a license?
      are you kidding me?
      to read a book?
      do you hear yourself?
      what next, i need a special 1-day permit to look out the window?
      people will laugh at you, this IS a PUBLIC forum…
      lol … a license to read a book.
      what a joke.

      and i need to FUND this kind of “system” ?
      wow, so all i need to do to prevent 451 is not fund it?
      talk about a weakness!

      IM LAUGHING SO MUCH RIGHT NOW

      1. The only reason physical books don’t have DRM at this point is that their physicality manages certain rights automatically without you noticing — the big example is that if you give or sell the book to someone else, you no longer have the book. DRM is, at its heart, an attempt to make data behave like a physical object. The fact that this led to overreach, anti-user behavior by content producers, and all manner of silliness was always inherent in the attempt.

        (My personal ereader is a second-hand Kindle Paperwhite of some vintage or other; I run everything through Calibre, and only haven’t joilbroken it because I’m lazy.)

  7. Words is words, and I don’t give a crap if the printed font can’t be rendered, of if the lines wrap a bit different. In fact I find print derived line spacing a little excessive for the small screen and tighten it up in my readers.

    However, that’s all about “recreational consumption media” i.e. fiction, novels, maybe readable non-fiction…. like Longitude, The Perfect Storm, Soul of a New Machine etc etc.

    What keeps me from buying more eBooks isn’t the format per se, but the low standard of execution in some technical, hobby and other non-fiction that have detailed, or complex photographs, illustrations and figures that have been scanned at maybe 320×240, and have lost all the frigging detail. In many cases these have been meant to be a reference for actually building or making something by the author, and should also be exportable as well as high quality. I’ve seen some so bad they look like they were a second or 3rd gen photocopy of the thing. Additionally, the quality of OCR (Why are they using OCR? Should they have digital originals in this day and age???) in some is only equal to what I saw out of a Genius hand scanner and bundled software in 1993… they should have clean crisp originals available right? They should have 21st century OCR tech right? so even if they have to go to OCR for the eBook, it should come out near perfect, and they should be able to afford the services of someone who passed 8th grade English to be able to pick out the obvious mistakes right? Apparently not. Very low quality some commercial and licensed eBooks are.

  8. when someone goes out of thier way and spends millions of dollars to prevent me from using something that they want to sell to me; i question what is it exactly that they want me to do with such a device?

    i always thought kindle was an ereader?
    i guess not? …
    an ereader would read ebooks, no?
    the file format that has existed for eons…
    and is still used.

    so i pay for it and it doesnt open ebooks?
    then what DOES it open? some format thats clearly not an ebook file,
    last time i got a free ebook file it was not an amazon file format,
    as only amazon members could read, such a narrow-market.
    so now i need to carry around 10 different ereaders?

    WASNT THAT THE WHOLE POINT OF EBOOKS?
    NOT NEEDING TO CARRY AROUND ALL THOSE BOOKS!?!?!?!!?
    so now we are carrying around a stack of ebooks,
    real smart, from the “smart-people, for smart-people.
    i suppose theres no books in the amazon store that can teach me to make my own competing ereader and ebookstore?

    are there ereaders that actually open ebooks???
    im confused,
    maybe i’ll just stick to making tables of values for electronics projects and storing them in ascii text files.
    that and reading real books…

    1. The Kindles all open a wide variety of not drm locked formats, I think the same is true of basically all the e-readers, the hard part is getting an e-book without some form of drm.

      Having your own format that may not be so widely used by others because it suits something you are doing, doesn’t mean the devices won’t read other stuff – my kindle DX is loaded with heaps of PDF and a few other book formats, and arguably the amazon book formats are superior at creating the right book like experience. And as far as I know Amazon do not object to you publishing or being able to display files in their format (but I’ve not looked into that so could be wrong there – but some other e-reader do read kindle books so some sort of deal must exist)

  9. I spent a lot of time in hospital during Covid-19 and being an avid reader I latched on to Kindle on a laptop. The only problem was the free offerings were crap. Later I found Open Library which is a free library online, though they periodically ask for donations. They appear to be limited to discarded library books which they can lend. The quality of the authors is much better though the scanning is variable. Diagrams and pictures are pretty poor.

  10. Interesting to read this comparison. Here in Canada, Kindle is not very popular — it has a tiny fraction of the e-reader market share, and Kobo completely dominates. Kobo is completely integrated with the public library systems in the major Canadian cities, it’s basically effortless. Pocket integration is an added bonus.

    Amazon has a really big selection of e-books (including some great discounts) but Kindle is a hassle here unless you are buying everything from Amazon. For everything else, Kobo just works.

  11. Points for the Palm mention.

    I was at 3Com when they bought US Robotics. Got a nice 56kb modem out of the merger. And a discounted Palm Pilot. Then, I learned to code on the thing. Created a fairly useful app to help people locate conference rooms, which were scattered across 3 buildings and named under a scheme that was only marginally related to their physical location.

    There’s a good book, called “Piloting Palm” by David Pogue. It’s worth reading & it’s currently on Mobilism.

  12. Worst case scenario to get rid of that Digital Restrictions Management is to put your E-reader on a flatbed scanner and then write some script to alternately scan a page, and go to the next page, and put everything through OCR again.
    I’ve never done this myself, because I simply never ever buy a book with DRM. For the same reason I will never buy (or even accept for free) a kindle.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.