websights

A Hacker’s Guide to Habitat: Part 1

When we set out to reinvent the book three years ago, we realized that we’d also have to reinvent the printing press. Then we found that, if books are going to be built with web technologies like ePub and HTML5, their next printing press might look a lot like something in a browser.

Inkling Habitat is a tool for building interactive and reflowable books for native mobile platforms and the web, with content that can be found directly through search engines. For example, check out these samples from O’Reilly’s JavaScript: The Definitive GuideFrommer’s Japan Day by Day, and Open Air Publishing’s Wine Simplified. We built Habitat using such bleeding edge HTML5 APIs as the local file system API and Mutation Observers, and, in addition, we’re storing all of our content in the cloud with source control.

With Habitat, we’re trying to answer two questions. One, what happens when you approach books seriously as maintainable software, rather than just as physical objects stuck in a file format? Two, given the current world of books as either ten-dollar text files or one-off apps, can we find a way to make building interactive books even easier than building print ones?

Over the next few months, we’ll use this blog to provide a technical introduction to Habitat. In this first series, we’ll be talking about the software that makes these interactive books for Inkling. And we’ll start to see how those books themselves are software, too.

Prototyping the web

Habitat is a hackathon project made good. It started off as a quick hack on our content production process — turning paper proofreading into on-screen bug reporting — and grew into a full printing press running on Chrome and caffeine.

Some days it feels as if we’re prototyping a future version of the web. For example, we’ve gotten our hands into pretty much every single HTML5 API, as well as early versions of the ECMA5, CSS3, and DOM4 specs. We’ve also been writing native JavaScript in a functional way for years now, and we’ve been able to use new features like Mutation Observers for rich text editing and the FileSystem APIs for storing books locally and then searching them like databases. As we’ve grown, we’ve also found ourselves using canvas, binary blobs, native drag-and-drop, video/audio tags, the history API, Web Workers, and CSS filters like a blur.

Of course, we’ve seen glitches at the edge, and we’ve been tracking them. When we run into a bug or regression in a browser, we work to file reduced test cases against it. Sometimes we’ll find it fixed a week or two later.

We targeted Chrome two years ago because, compared to Firefox 3.6, Chrome was already doing quick, auto-updating releases; because we loved what we were seeing in the HTML5 drafts and Chrome was landing this stuff faster; and because we were simply pushing the browser, any browser, to its limits, and we could only workaround so many quirks before we would have begun to slow down.

Some of these quirks were right on the surface. Firefox 3.6 just didn’t support some of the features that would let us behave as a desktop app: XMLHttpRequest2, history.pushState, or CSS transitions. Firebug had gone stale compared to the Chrome/WebKit development tools. And Chrome just felt faster: the one-process-per-tab model was much more forgiving while debugging a long-lived, data-heavy web app. Happily, as we built Habitat, many of these differences started to disappear.

Case study: Hacking validation

Other quirks went much deeper. As a content creation tool, a few of the core operations in Habitat are related to XML/XHTML processing. We found we could lean on the native DOMParser class to construct documents and fragments from content strings, like this:

This meant that we could skip the work of writing a parser, use the intelligence already in the browser, and hop up to the next level of abstraction. We also found that a DOMParser provides useful error messages when passing a string of invalid markup. In Chrome, this message is wrapped in a <parseerror> element:

We wanted good validation messages in our editor, but it wasn’t worth the time to roll our own validator. By leaning on the browser again, we were able to get something that we would have punted almost for free. (We still had to clean up the errors and make them more readable.)

But the behavior changed depending on the environment. Our headless WebKit environment, PhantomJS, indicated a parse error by returning a null document element. Firefox was more useful, but it wrapped its error message differently:

If we had been starting off cross-browser, we would have had to stop and write code to normalize the output between browsers, write a general validator with our own messages, or just leave the feature out.

Staying focused

The first of us working on Habitat had to choose: we could slow down and write to where the web was, with normalization layers for things like XML parsing inconsistencies, XHR bugs specific to Safari, and generally incomplete APIs, or we could keep our test coverage high, build to living standards, and write to where we saw the web going.

We want the web to win. We’ll open up Habitat to other browsers as we grow. But as a small application team with big ambitions, Chrome has been rocket fuel.