



And me? I’m joining Google’s Geo team in Sydney, where I’ll be working with the world’s most popular travel application, Google Maps.
TL;DR: Google is trying to position its Google Glass headset as a consumer device with the cool factor of an iPhone. But its initial users are likely to be businesses, and they will need to be convinced about the value it will deliver, not its appearance.
Hardware revolution
We fling about the word “revolutionary” with wild abandon these days. The primary hardware innovation of the Apple iPhone, for example, was really just an evolutionary step. Replacing a keypad with a touchscreen meant that, instead of holding your phone in one hand and watching its screen as you tap the keys with the other, you could now hold your phone in one hand and watch its screen as you tap the screen with the other. As we know, this seemingly subtle change proved to radically enhance the usability of the phone and set the benchmark for today’s smartphones — but they’re still smartphones.
Google Glass, on the other hand, is genuinely revolutionary piece of kit. As the first real consumer-grade attempt at an augmented reality computer, it completely dispenses with the screen, the keypad and even the entire “holdable” device itself. This means throwing out every user interface paradigm developed since the 1970s, when computers started to look like today’s computers, and building something entirely new to replace them. Gulp?
Yet Google appears to be petrified of something different: that the device will be perceived of as “dorky”. As you can see from the picture to the right, I can personally attest that this fear is not entirely misguided: real-life wearable computers (and their wearers) do tend to fall more on the side of “geeky” than “cyberpunky”. Google’s marketing to date has thus consisted nearly entirely of increasingly odd antics to make it “cool”: stunt cyclists performing antics on the roof of a convention centre, skydivers leaping out of airplanes and an entire fashion show with slinky models strutting their stuff.
But let’s step back in time. Imagined being offered the chance to clip a unwieldy, heavy plastic box to the waistband of your bell-bottomed pants, bolt two bright-orange foam sponges over your ears with a shiny metal hairband, and string these bits together with wire. Would you pay good money for this fashion disaster?
If it’s the 1970s, hell yeah: the Sony Walkman was a runaway hit. Never mind the clunky appearance, the mere fact that it for the first time let you listen to music anywhere was worth the sartorial price of admission. And without that ability, the minor miracles in miniaturising and ruggedizing of the unwieldy tape decks of yore necessary to produce the Walkman would have gone to waste.
Software evolution
But Google isn’t talking, at all, about what you can or, more importantly, could do with the Glass: their famous promotional video shows the capabilities of various existing Google apps doing precisely what they do now, only on a heads-up display. Sure, the user interface has changed radically, but the capabilities have not.
So will those existing apps on Glass be slick enough to make it a must-buy? Despite Google’s all-star developer team, their track record for customer-facing products is distinctly spotty and the sheer challenge of designing an entirely new way to interact would perplex even Apple. The little we know of the hardware also indicates that some technologies considered key to heads-up interaction, notable eye tracking, are not going to be a part of the package. It’s thus exceedingly unlikely that the first iteration of Glass’s UI will nail it, and Google’s reluctance to reveal anything about the interface’s actual appearance and behavior strongly hints that they have their doubts as well.
Odds are, then, that Google Glass will be a dorky-looking product that offers an inferior interface for the kind of things you can do easily with a modern mobile phone, which has, after all, evolved for 20-plus years in the marketplace. This is not a recipe for success in the consumer marketplace.
The solution? Sell the Glass on what it can do that nothing else can.
Five things you can do with Glass that you can’t with a mobile phone
1) Simultaneous interpretation. Hook up two Glasses so they can translate each user’s speech and beam it over to the other, where it is displayed as subtitles. Presto: you can now hold a natural conversation and track all the nonverbal communication that would be lost if you had to glance at your smartphone all the time.
(Not coincidentally, I wrote my master’s thesis on this back in 2001. My prototype was a miserable failure because computer miniaturization, speech recognition and my hardware hacking skills weren’t up to snuff, but I think Glass provides an excellent platform for producing something usable.)
2) Tactical awareness. A mobile phone app that shows the location of alerts and/or other security guards would be rather useless: what are you going to do, pull out your phone and start browsing your app directory when the robbers strike? The same application for an always-on Glass, on the other hand, is a natural fit.
(This, too, is by no means a new idea. MicroOptical’s heads-up display, the direct predecessor of the optics behind Google Glass, was the result of a DARPA grant for the US Army’s Land Warrior project. The pathetic fate of that project, which ran from 1994 before being cancelled in 2007 and kicked off again in 2008 without ever accomplishing anything of note, also hints at why Google is, probably wisely, steering far clear of the bureaucratic morass of military procurement.)
3) Virtual signage. Imagine an enormous warehouse filled with a variety of ever-changing goods, along the lines of an Amazon or UPS logistics center. Right now, to find a given package in there, you’d have to “look it up” on a PC or smartphone, get a result like “Aisle C, Section 17, Shelf 5” and match that to signage scattered all over the place. What if your Glass could just direct you there with visual and voice prompts, and show you the item number as well so you don’t have to print out and carry slips of paper? The difference sounds almost trivial, but suddenly you’ve freed up a hand and reduced the risk of getting run over by a forklift as you squint at your printout.
(Back in 2004, commercial wearable computing pioneers Xybernaut sold pretty much exactly this idea to UK grocery chain Tesco, but their machines were clunky battery hogs so it didn’t pan out too well. Xybernaut’s subsequent implosion after its founders were indicted for securities fraud and money laundering didn’t help.)
4) Surgery. Surgical operating theatres are filled with machines that regulate and monitor and display a thousand things on a hundred little screens, with tens of bleeps and bloops for various alerts and events. What if the surgeon could see all that information during a complex procedure, without ever having to take their eyes off their actual work?
(Once again, some products that do this already exist, but Glass has the potential to take this from an expensive, obscure niche to an everyday medical tool — once the FDA gets around to certifying it sometime around 2078, that is.)
5) Games set in reality. Mashing up reality and gaming is hard: countless companies have taken a crack at it over the past decade, and all foundered on the basic problem of having to use a tiny little mobile display as the only window into the game world. As Layar’s lack of success indicates, running around holding a phone in front of your face isn’t much fun, and relying on location alone to convey that there’s an invisible virtual treasure chest or tentacle monster in a physical alleyway stretches the imagination too much. But with an augmented reality display, this will suddenly change, and Valve is already making a big punt on it, although Michael Abrash rightly cautions against setting your expectations too high.
What next?
Notice one thing about the first four ideas? They’re all business applications, whose customers will willingly tolerate a clunky, somewhat beta interface as long as they can still get real dollars-and-cents value out of it. This is how both PCs and mobile phones got started, and once the nuts and bolts are worked out, the more mature versions can be rolled out to general consumers.
And once Glass (or something like it) reaches critical mass, we’ll suddenly have streets full of people with network-enabled, always-on video cameras, and a rather scary world of possibilities opens up. Add object recognition, and you can find litter, vandalism, free street parking spots. Add data mining, and you can spot the suddenly crowded new cafe or restaurant, or catch the latest fashion trend as it happens. Add face recognition, and you can find missing persons, criminals and crime suspects.
To Google’s credit, they are partnering with other developers almost from day one, and there will undoubtedly be even better ideas than these largely unoriginal off-the-cuff thoughts. We can only hope that the idea is spotted and executed well enough to turn it into Glass’s killer app… but if Google keeps on being awfully coy about Glass’s capabilities, limiting access to dinky two-day hackathons and envisioning Google+ as the main use case, that day may still be some way away.
By buying a travel guidebook publisher solely to bolster its local search content, Google risks both straddling itself with an unprofitable albatross and missing out on a way to differentiate itself from its rivals.
Google’s recent acquisition of Frommer’s has given rise to much comment about the “real” intentions of the Big G and what this means for other travel publishers. While it’s less entertaining than some of the theories floating around, for time being I’m willing to accept their stated rationale at face value: just another stepping stone to “provide a review for every relevant place in the world“, and thus a tactical move to bolster local coverage for the ailing Google+.
There are, however, two fundamental problems with the purchase and this goal that do not seem to have garnered much attention.
The first is the problem of content creation. Frommer’s claims “4,500 destinations, 50,000 images and 300,000 events“, but they leave unsaid the source of every one of those bits of data: their own printed guidebooks. Google thus has an unpalatable array of choices:
If Google goes with the 3rd or 4th option, and I have hard time seeing them not do so, their second problem (or, rather, missed opportunity) will be the lack of content curation. By treating guidebooks as no more than a database in print form, turning them into a homogenous soup of atomic points of interest, Google is effectively conceding to compete on a level playing field with local search rivals like Facebook and Foursquare. All three now assume that users are searching for individual points, easily filtered on individual axes: “best five-star hotel in New York by user ratings”, “cheap Japanese restaurant in Melbourne CBD open for lunch” etc.
But a guidebook is not the same as a phone book: it’s supposed to contain a careful selection of the best places to go, arranged in a sensible way. Neither Facebook nor Foursquare can offer a sensible answer to real travel questions like “Funkiest bars in Brussels”, “Romantic day in Paris”, “Three-day hike in New Zealand”, whereas any guidebook about those places that is worth its salt can. As an engineering-driven company, Google has given things like this little thought simply because they are hard problems for artificial intelligence to solve — but using Frommer’s team of authors, it would be possible to augment the automated results produced by things like the Knowledge Graph to field hand-curated content as well.
If Google goes ahead and does this, then the Guidebook of the Future will be that much closer to reality and travel publishers will have a real problem on their hands. But I doubt it, and that’s why those publishers are breathing a sigh of temporary relief: one competitor less means a bigger slice of the shrinking pie for the rest.
In my previous post on the Travel Guide of the Future, I glibly dismissed the possibility of an augmented reality interface as a form factor, because “we haven’t managed to figure out a decent portable interface for actually controlling the display … it’s looking pretty unlikely until we get around to implanting electrodes in our skulls.”
Two weeks later, word leaked out about what was cooking at Google X, and last week Google officially announced Project Glass. Oops! Time to eat my words and revise that assumption in light of the single most exciting announcement in travel tech since, um, ever.
As it happens, augmented reality displays are a topic I have more than a passing familiarity with: for my master’s thesis back in 2001, I built a prototype wearable translation system dubbed the Yak-2, using a heads-up display. At the time, the MicroOptical CO-7 heads-up display (pictured above) was state-of-the-art military hardware reluctantly lent to researchers for $5000 a pop; it’s almost surprising that, in the ten years that have passed, it’s not much different from what Google is using today, which the smart money seems to think is the Lumus OE-31.
Credentials established? Let’s talk about what challenges Google face today.
User interface: actually using the darn thing
Hardware
The absolute Achilles heel of wearable computing for me, for Google and for everybody who has ever tried to popularize the darn things and failed is the user interface. Every mainstream human interface device used for computing devices — keyboards, touchscreens, mice, trackballs, touchpads, you name it — is intended to be operated by hand pressing against a surface, and that’s the one thing you cannot sensibly do while operating a wearable computer. A lot of research has gone into developing ways around this, but none have gained traction as they all suffer from severe drawbacks: handheld chording keyboards (extremely steep learning curve), gesture recognition (limited scope and looks strange), etc. My Yak prototypes used a handheld mouse-pointer thingy, which was borderline functional but still intolerably clunky, and speech recognition, which worked tolerably well in lab conditions with a trained user, but fell flat in noisy outdoor environments.
Based on the Glass Project concept video, Google is trying their luck with speech recognition, a tilt sensor for head gestures, plus — apparently — an entirely different interface: eye tracking, so you can just look at an icon for a second to “push” it. (Or so it seems; the other possibility is that the user is making gestures off-camera, although the bit where he replies to a message while holding a sandwich makes this unlikely. While easier to implement technically, this would be far inferior as an interface, so for the rest of this post I’m going to optimistically assume they do indeed use eye tracking.)
The radical-seeming concept is actually not new, as eye tracking is a natural fit for a heads-up display. IBM was studying this back around 2000 and ETH presented a working prototype of the two in combination in 2009, but Google’s prototype looks far more polished and will be the first real-world system deploying the two simultaneously that I’m aware of. Problem solved?
Software
Not quite. The biggest of Google’s user interface problems is that they now need to develop the world’s first usable consumer-grade UI for actually using this thing. As the numerous painfully funny parodies attest, it’s actually very hard to get this right, and Google’s video glosses over many over of the hard decisions that need to made to provide an augmented reality UI that’s always accessible, but never in the way. How does voice recognition know to differentiate when it’s supposed to be listening for commands, and when you’re just talking to a buddy? How does the software figure out that moving the head down when stretched should pop up the toolbar, but moving it down to pour coffee should not? You can only presume there are modes available “full UI”, “notifications only” or “completely off”, but without physical buttons to toggle it’s difficult even to figure out a solid mechanism for switching between these.
And that’s just for user-driven “pull” control of the system. For “push” notifications, like the subway closure alert, Google has to be able to intelligently parse the user’s location, expected course and a million other things to guess what kinds of things they might be interested in at any given moment — and, yes, resist the temptation to spam them with 5% off coupons for Bob’s Carpet Warehouse. Fortunately, this kind of massive data number-crunching is the kind of thing Google excels at, and the glasses will presumably come with a limited set of in-built general-use notifications that can be extended by downloading apps.
As a reference point, it’s taken Android ten years to get most of the kinks worked out from something as simple as message notifications on a mobile screen, and even UI gurus Apple didn’t get it right the first time around. It’s pretty much a given that the first iterations of Project Glass will be very clunky indeed.
Incidentally, while the video might lead you to believe the contrary, one problem Google won’t have is the display blocking the entire field of view: the Lumus display covers only a part of one eye, with your brain helpfully merging it in with what the other eye sees.
Hardware: what Google isn’t showing you
Take a careful look at Google’s five publicity photos. What’s missing? Any clue of what lies behind at the other end of the earpieces, artfully concealed with a shock of hair or angled face in every single shot. Indeed, Lumus’s current displays are all wired to battery packs to serve that energy-hungry display (just like my CO-7 back in 2001), although apparently wireless models with enough capacity to operate for a day are on the horizon and Sergey Brin was “caught” (ha!) wearing one recently.
Display aside, though, the computing power to drive the thing still has to reside somewhere, and even with today’s miracles of miniaturization that somewhere cannot be in inside that thin aluminum frame. Thus somewhere in your pocket or bag there will be phone-sized lump of silicon that does the heavy lifting and talks to the Internet. The sensible and obvious thing to do would be to use an actual phone, in which case the glasses just become an accessory. This kills two birds with one stone: it conveniently cuts down what would otherwise be a steep pricetag of $1000+ into two more manageable chunks of $500 or so each (assuming Google initially sells the Lumus more or less at cost), and it provides extra interfaces in form of a touch screen and microphone that can be used for mode control and speech recognition (eg. press button and hold phone up to mouth to voice commands).
Killer app: travel guide or Babel Fish?
Google is quite clearly thinking about Project Glass as just another way to consume Google services: socialize on Google Plus, find your way with Google Maps, follow your friends with Latitude, etc. While some of this obviously has the potential to be very handy, and almost all of it certainly qualifies as “cool”, without anything entirely new the device runs the risk of becoming the next generation of Bluetooth headset, a niche accessory worn only by devoted techheads. The question is thus: what sort of killer apps this device could enable as a platform? Obviously, my interest lies in travel!
So far, most augmented reality travel apps have assumed that reality + pins = win, but this doesn’t work for augmented reality for precisely the same reason it doesn’t work for web apps:
As a rule, people do not wander down streets randomly, hoping that a magical travel app (or printed guidebook) will reveal that they have serendipitously stumbled into a fascinating sight. No, they browse through the guide before they leave, or on the plane, or in the hotel room the day before, building up a rough itinerary of where to go, what to see and what to beware of. A travel guide is thus, first and foremost, a planning tool.
Which is not to say Project Glass won’t have its uses. Even out of the box, turn by turn navigation in an unfamiliar city without having to browse maps or poke around on a phone, is by itself pretty darn close to a killer app for the traveller, and being able to search on the fly for points of interest is also obviously useful.
But probably the single most powerful new concept to explore is what I poked around with in 2001, namely translation. Word Lens/Google Goggles type translation of written text is obvious, but the real potential and challenges lie in translation of the spoken word. Using the tethered phone’s microphone and speaker, it should be possible to parse what the user says, have them confirm it on screen, and either have them try to read it out or simply output the translated phrase via the speaker. Depending on how good the speech recognition is (and this is pushing the limits today), it could even be possible to hand the phone over to the other person, have them speak, and have the glasses translate that instantly. And if both parties are wearing the glasses, with a microphone and an earphone, could we finally implement the Babel Fish and have unobtrusive simultaneous translation, with the speech of one rendered on the screen of the other? This may not be science fiction any more!
Conclusion
Project Glass has immense potential, but like most revolutions in technology, people are likely to overestimate the short-term impact and underestimate the long-term impact. The first iteration is likely to prove a disappointment, but in a few years’ time this or something much like it may indeed finally supplant the printed book as the traveler’s tool of choice on the road, and create a few new billion-dollar markets in the process.