LOOKING @
THE FUTURE
Talking to
machines and machines responding is a disruptive idea still. But it could
change the way we think about tech
VLAD Sejnoha is talking to the TV again. OK, maybe you’ve done that, too. But here’s the weird thing: His TV is listening. “Dragon TV,” Sejnoha says to the screen, “find movies with Meryl Streep.” Up pops a list of films like “Out of Africa” and “It’s Complicated.”
“Dragon TV, change to CNN,” he says. Presto — the channel f lips to CNN. Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology, and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.
Here, Sejnoha, the company’s chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances. It is a wildly disruptive idea. But such systems are beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we’ve talked only to one another. What if we begin talking to all sorts of machines, too — and, like Siri, those machines respond as if they were human?
Granted, people have been talking into machines and at machines since the days of Edison’s phonograph. By the 1980s, commercial speech recognition systems had become sophisticated enough to transcribe spoken words into text. Today, voice technology is a fixture of many companies’ customer-service operations, albeit an occasionally maddening one.
But now the race is on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icon and, some experts say, eventually pose challenges for giants like Google by bypassing their search engines.
No player is bigger in voice technology than Nuance, an industry pioneer that has acquired more than 40 companies in the field and today employs 7,300 people. It is one of the companies that helped make a big technological leap from programs that take dictation to systems that actually extract meaning from words and respond to them. Now it wants to push far beyond that.
“They are the equivalent of Microsoft, Google or Amazon in a very niche technological space,” says Andrew Rosenberg, an assistant professor of computer science at Queens College. Like many new technologies, sophisticated voice systems have potential drawbacks. Some experts worry about privacy invasions, others about our ever-deepening attachment to devices like smartphones.
Humans are wired for speech and tend to respond to talking devices as if they were kindred spirits, says Sherry Turkle, a professor of the social studies of science and technology at the Massachusetts Institute of Technology. “I’m not saying voice recognition is bad,” Professor Turkle says. “I’m saying it’s part of a package of attachments to objects where we should tread carefully because we are pushing a lot of Darwinian buttons.”
Only a decade ago, voice-enabled virtual assistants seemed more science fiction than business fact. But in 2000, Paul Ricci, a former executive at Xerox, concluded that voice software could one day disrupt the marketplace the way the mouse and the icon had in the 1980s.
“We had to decide early on where there were markets where we could successfully deploy the technology,” says Ricci, Nuance’s CEO. Nuance, then known as ScanSoft, went on an aggressive acquisition spree. Its most significant acquisition was Nuance, a rival that had been spun off from SRI International of California. The combined company took the Nuance name. SRI later developed and spun off Siri, which was acquired by Apple in 2010.
Nuance reported revenue of $1.3 billion for 2011, with $515 million of that coming from its healthcare technology business. Not everyone is as enamoured of voice technology. Some privacy advocates worry that it adds an audio track to the digital trail that people leave behind when they use the Web or apps, potentially exposing them to more data mining. Voice recognition software works by sending speech to processors that break down spoken words into sound waves and use algorithms to identify the most likely words formed by the sounds. The system typically records and stores speech so it can teach itself to become more accurate over time. Nuance, for example, believes that, aside from the federal government, it has amassed the largest archive of recorded speech in US.
Nuance says it is impossible to identify consumers from the recordings, because the company’s system recognizes people’s voices only by unique codes on their devices, rather than by their names. The company’s privacy policy says it uses the voice data of consumers only to improve its own internal systems.
“We have no idea who you are today,” says Peter Mahoney, the company’s chief marketing officer. Such assurances aside, voice recognition software could conceivably pose enough of a risk to people’s privacy that regulators in Washington are watching.
Dragon Go, Nuance’s first direct-toconsumer app, is part of a push to build the brand’s visibility and demonstrate Nuance’s technological advances to business customers. Its real goal is even bigger: to disrupt the role of search engines as gatekeepers to the Web.
For the most common queries, Dragon Go usually bypasses search engines by taking users directly to websites of companies like Amazon, Expedia and OpenTable, which are Nuance partners on the app. If people don’t find what they’re looking for there, Dragon Go offers traditional Web search.
The benefit for consumers, Nuance executives say, is faster answers in fewer steps. In many cases, Nuance collects a small fee from partner sites when people make restaurant reservations or complete purchases. The app could be construed as a challenge to the likes of Google and Microsoft, which have their own voice products — such as Google Voice Actions and Microsoft Tellme — as well as search engines.
Christopher Katsaros, a Google spokesman, declined to comment. The company has recently updated Google Voice Actions, its voice-command system for Android phones, with a feature that continuously converts people’s speech to text, making it faster and smoother to dictate and send text messages, search Google aloud, or ask for directions.
Lezli Goheen, a spokeswoman for Microsoft, said that the company had addressed consumers’ expectations for easier access to information through several means. In addition to Tellme, a program included in all new Windows products that lets people dictate text messages and commands to applications like calendars, she said, the company has introduced Bing Voice Search, a program that lets people speak their Bing searches.
Nuance, meanwhile, has similarly ambitious plans for its health care business. In collaboration with IBM, the company is developing analytics to scour the medical notes that doctors dictate after they see patients. The idea is to search the text for common red flags — like medicines that interact dangerously — and automatically alert doctors, hopefully reducing problems and health care costs.
Members of US Airways’ frequent-flier program who have registered their mobile phone numbers are greeted by name by “Wally,” an interactive voice system that Nuance created for the airline.
US Airways introduced Wally last summer, as part of a relocation of its offshore customer service call-in operations back to the United States. Nuance designed the system to anticipate callers’ requests. Wally, for example, can automatically tell frequent-flier members their seat assignments or report whether they have received upgrades. It also converts people’s speech to text, so that, should customers ask to speak a live operator, they don’t have to repeat their original requests. VOICE CODE
Future Sounds Bright ...
THE RACE is on to make the voice the sought-after new interface between us and our technology. The results could rival innovations like the computer mouse and the graphic icons THE TECHNOLOGY, when evolved, can eventually pose challenges for giants like Google by bypassing their traditional search engines
...But the Past Wasn’t Perfect
ONLY A decade ago, voice-enabled virtual assistants seemed more science fiction than business fact. BUT IN 2000, Paul Ricci, a former executive at Xerox, concluded that voice software could one day disrupt the marketplaces
How Does it Work
VOICE RECOGNITION software works by sending speech to processors that break down spoken words into sound waves and use algorithms to identify the most likely words formed by the sounds. THE SYSTEM typically records and stores speech so it can teach itself to become more accurate over time
NATASHA
SINGER NYT ET120403
No comments:
Post a Comment