In my previous post on the Travel Guide of the Future, I glibly dismissed the possibility of an augmented reality interface as a form factor, because “we haven’t managed to figure out a decent portable interface for actually controlling the display … it’s looking pretty unlikely until we get around to implanting electrodes in our skulls.”
Two weeks later, word leaked out about what was cooking at Google X, and last week Google officially announced Project Glass. Oops! Time to eat my words and revise that assumption in light of the single most exciting announcement in travel tech since, um, ever.
As it happens, augmented reality displays are a topic I have more than a passing familiarity with: for my master’s thesis back in 2001, I built a prototype wearable translation system dubbed the Yak-2, using a heads-up display. At the time, the MicroOptical CO-7 heads-up display (pictured above) was state-of-the-art military hardware reluctantly lent to researchers for $5000 a pop; it’s almost surprising that, in the ten years that have passed, it’s not much different from what Google is using today, which the smart money seems to think is the Lumus OE-31.
Credentials established? Let’s talk about what challenges Google face today.
User interface: actually using the darn thing
The absolute Achilles heel of wearable computing for me, for Google and for everybody who has ever tried to popularize the darn things and failed is the user interface. Every mainstream human interface device used for computing devices — keyboards, touchscreens, mice, trackballs, touchpads, you name it — is intended to be operated by hand pressing against a surface, and that’s the one thing you cannot sensibly do while operating a wearable computer. A lot of research has gone into developing ways around this, but none have gained traction as they all suffer from severe drawbacks: handheld chording keyboards (extremely steep learning curve), gesture recognition (limited scope and looks strange), etc. My Yak prototypes used a handheld mouse-pointer thingy, which was borderline functional but still intolerably clunky, and speech recognition, which worked tolerably well in lab conditions with a trained user, but fell flat in noisy outdoor environments.
Based on the Glass Project concept video, Google is trying their luck with speech recognition, a tilt sensor for head gestures, plus — apparently — an entirely different interface: eye tracking, so you can just look at an icon for a second to “push” it. (Or so it seems; the other possibility is that the user is making gestures off-camera, although the bit where he replies to a message while holding a sandwich makes this unlikely. While easier to implement technically, this would be far inferior as an interface, so for the rest of this post I’m going to optimistically assume they do indeed use eye tracking.)
The radical-seeming concept is actually not new, as eye tracking is a natural fit for a heads-up display. IBM was studying this back around 2000 and ETH presented a working prototype of the two in combination in 2009, but Google’s prototype looks far more polished and will be the first real-world system deploying the two simultaneously that I’m aware of. Problem solved?
Not quite. The biggest of Google’s user interface problems is that they now need to develop the world’s first usable consumer-grade UI for actually using this thing. As the numerous painfully funny parodies attest, it’s actually very hard to get this right, and Google’s video glosses over many over of the hard decisions that need to made to provide an augmented reality UI that’s always accessible, but never in the way. How does voice recognition know to differentiate when it’s supposed to be listening for commands, and when you’re just talking to a buddy? How does the software figure out that moving the head down when stretched should pop up the toolbar, but moving it down to pour coffee should not? You can only presume there are modes available “full UI”, “notifications only” or “completely off”, but without physical buttons to toggle it’s difficult even to figure out a solid mechanism for switching between these.
And that’s just for user-driven “pull” control of the system. For “push” notifications, like the subway closure alert, Google has to be able to intelligently parse the user’s location, expected course and a million other things to guess what kinds of things they might be interested in at any given moment — and, yes, resist the temptation to spam them with 5% off coupons for Bob’s Carpet Warehouse. Fortunately, this kind of massive data number-crunching is the kind of thing Google excels at, and the glasses will presumably come with a limited set of in-built general-use notifications that can be extended by downloading apps.
As a reference point, it’s taken Android ten years to get most of the kinks worked out from something as simple as message notifications on a mobile screen, and even UI gurus Apple didn’t get it right the first time around. It’s pretty much a given that the first iterations of Project Glass will be very clunky indeed.
Incidentally, while the video might lead you to believe the contrary, one problem Google won’t have is the display blocking the entire field of view: the Lumus display covers only a part of one eye, with your brain helpfully merging it in with what the other eye sees.
Hardware: what Google isn’t showing you
Take a careful look at Google’s five publicity photos. What’s missing? Any clue of what lies behind at the other end of the earpieces, artfully concealed with a shock of hair or angled face in every single shot. Indeed, Lumus’s current displays are all wired to battery packs to serve that energy-hungry display (just like my CO-7 back in 2001), although apparently wireless models with enough capacity to operate for a day are on the horizon and Sergey Brin was “caught” (ha!) wearing one recently.
Display aside, though, the computing power to drive the thing still has to reside somewhere, and even with today’s miracles of miniaturization that somewhere cannot be in inside that thin aluminum frame. Thus somewhere in your pocket or bag there will be phone-sized lump of silicon that does the heavy lifting and talks to the Internet. The sensible and obvious thing to do would be to use an actual phone, in which case the glasses just become an accessory. This kills two birds with one stone: it conveniently cuts down what would otherwise be a steep pricetag of $1000+ into two more manageable chunks of $500 or so each (assuming Google initially sells the Lumus more or less at cost), and it provides extra interfaces in form of a touch screen and microphone that can be used for mode control and speech recognition (eg. press button and hold phone up to mouth to voice commands).
Killer app: travel guide or Babel Fish?
Google is quite clearly thinking about Project Glass as just another way to consume Google services: socialize on Google Plus, find your way with Google Maps, follow your friends with Latitude, etc. While some of this obviously has the potential to be very handy, and almost all of it certainly qualifies as “cool”, without anything entirely new the device runs the risk of becoming the next generation of Bluetooth headset, a niche accessory worn only by devoted techheads. The question is thus: what sort of killer apps this device could enable as a platform? Obviously, my interest lies in travel!
So far, most augmented reality travel apps have assumed that reality + pins = win, but this doesn’t work for augmented reality for precisely the same reason it doesn’t work for web apps:
As a rule, people do not wander down streets randomly, hoping that a magical travel app (or printed guidebook) will reveal that they have serendipitously stumbled into a fascinating sight. No, they browse through the guide before they leave, or on the plane, or in the hotel room the day before, building up a rough itinerary of where to go, what to see and what to beware of. A travel guide is thus, first and foremost, a planning tool.
Which is not to say Project Glass won’t have its uses. Even out of the box, turn by turn navigation in an unfamiliar city without having to browse maps or poke around on a phone, is by itself pretty darn close to a killer app for the traveller, and being able to search on the fly for points of interest is also obviously useful.
But probably the single most powerful new concept to explore is what I poked around with in 2001, namely translation. Word Lens/Google Goggles type translation of written text is obvious, but the real potential and challenges lie in translation of the spoken word. Using the tethered phone’s microphone and speaker, it should be possible to parse what the user says, have them confirm it on screen, and either have them try to read it out or simply output the translated phrase via the speaker. Depending on how good the speech recognition is (and this is pushing the limits today), it could even be possible to hand the phone over to the other person, have them speak, and have the glasses translate that instantly. And if both parties are wearing the glasses, with a microphone and an earphone, could we finally implement the Babel Fish and have unobtrusive simultaneous translation, with the speech of one rendered on the screen of the other? This may not be science fiction any more!
Project Glass has immense potential, but like most revolutions in technology, people are likely to overestimate the short-term impact and underestimate the long-term impact. The first iteration is likely to prove a disappointment, but in a few years’ time this or something much like it may indeed finally supplant the printed book as the traveler’s tool of choice on the road, and create a few new billion-dollar markets in the process.