Microsoft Gets Documents Talking

May 28th, 2008

I just read about Microsoft’s new DAISY XML plug-in that will allow users to save text files created in Microsoft Word into DAISY XML, which is short for the Digital Accessible Information SYstem eXtensible Markup Language. DAISY XML tags and maps the text documents so they can be converted to eBooks and digital talking books later on. It’s designed for Microsoft Office Word 2007, Word 2003, and Word XP.

This seems like a move in the right direction, but I don’t get why its taken so long though. I’ve been putting talking animated avatars into my Powerpoint presentations for about 8 years, by using a Sensory avatar/lipsync technology. It makes presentations very engaging to have an avatar pop up and start talking to the audience. Seems like Microsoft should add something like this to Powerpoint…if any Microsoft people read this, I’ll give it to you for almost free.

Hey – Do you get the humor in Microsoft’s naming? The first singing computer was credited to IBM and it sang “A bicycle built for two” with the lyrics “Daisy, Daisy…”. Arthur C. Clark saw a demo of this and made his robot HAL from 2001 A Space Odyssey sing this when it was being shut down.

Internet Search via Cell Phones

April 9th, 2008

I made it to a couple of interesting tradeshows over the last month. The Voice Search Conference was held at the Marriot hotel in San Diego, which provided a really nice setting; the show was very well organized and well attended. Voice Search is all about bringing the power (and revenues) from internet search engines to the cell phone market. Google, Microsoft, Yahoo, and others are getting into this in a big way.

On the first morning I accidentally went into the wrong room for breakfast and sat down with a bunch of people from another industry. They were really negative about phone-based speech recognition, and offered these opinions:

  • “Oh like when you call somewhere and the phone says ‘Press or Say One’”
  • “Yeah I tried it but it never works with my voice”
  • “I hate that stuff, I just want to talk to a live operator”

I pointed out that it’s gotten a lot better than this! Directory services like 1800Goog411 (Google), 1800Call411 (Microsoft), 1800Free411 (Jingle) actually work quite well and do save time. Most of them are based on Nuance engines, which are very powerful server-based technologies. Nuance is the 800 pound gorilla in the speech space, because they’ve acquired pretty much every player in speech recognition (well, other than Sensory of course, but they certainly cleared away all our competitors in the embedded area). Microsoft, Google, and Yahoo have pretty large speech R&D teams, but I’d guess they all use Nuance IP in some fashion, probably to expand their language coverage, if not more.

I found it humorous when someone quoted a woman from Nuance who said “My boss told me never to give live demos at shows because they never work.” Novauris gave some of the best demos at the show, but sure enough they pushed the envelope until some stopped working. I do commend them for being willing to demonstrate technologically challenging concepts in front of a live audience. It can be something of a crap shoot showing off cutting edge technologies.

I spoke on a panel at the Voice Search Conference, and one of the other speakers was from IBM India. He gave a presentation about a telecom web that they are deploying so that users can use their phones to find and hear about service providers in India, basically through short audio messages like “Hello, I’m Pradeep the plumber. I have 12 years of experience doing all types of plumbing…” This is similar to searching the web and reading the short blurbs about different businesses, but instead hearing the entries from a telephone.

At CTIA Wireless 2008, the big cell phone show in Las Vegas, I had a chance to try the Vlingo voice search engine. Yahoo has licensed it already, and it is simply AMAZING! It is the closest thing to “natural language” and “context independence” speech recognition that I have ever seen. Vlingo provides a speech to text service that utilizes a thin client to server model in order to provide recognition in cell phones apps.

Bluetooth headsets were prominently on display at the show. Plantronics introduced a comfortable and cool looking headset that included a case which provides a 5 hour recharge – Great Concept!BlueAnt V1

The high point of the CTIA conference for me was BlueAnt Wireless winning an award for Best of Show in the peripherals category for their V1 Bluetooth Headset. BlueAnt is a smart and aggressive company that is making rapid inroads and finding a lot of success in the Bluetooth headset and carkit markets. The V1 is billed as the first voice-controlled headset, and it is based on Sensory’s BlueGenie Voice Interface, which gives the user the ability to control common functions like answering or rejecting calls and pairing devices vocally. It even has 1800Goog411 as a built-in command, meaning you’ll never have to press buttons to place a call to any business across the US. Now that’s what I call useful!

Todd
sensoryblog@sensoryinc.com

Power Outages

January 7th, 2008

2008 is here, and in the Silicon Valley it comes with a series of powerful storms, winds up to 60 miles per hour and rain, rain, rain. Of course, what this means is power outages are upon us; a short one and the house will probably stay cold enough to not worry about the food going bad. We’ll build fires, light candles, load the flash lights with batteries, and when the power comes on, spend way too much time resetting our clocks.

Yeah, that’s my pet peeve. No one ever created a standard way to reset the time on clocks, so it always takes a bit of systematic experimentation to figure out exactly how to reset clocks and appliances like VCR’s.

But wait…Have you seen Sensory’s new time-set technology? This inconvenience could be a thing of the past if the clock uses a Sensory chip. Check out the YouTube video.

This is what customers have been asking us for years and years and the accuracy was never quite there, but we kept working on it. I’m happy to say, we’re there! Sensory now has a chip that sells in volumes for under $2 that can be integrated into clocks and uses voice recognition to set the alarm time with natural phrases like “Five thirty-five AM”. Recognizing digits in a natural context is one of the Holy Grail’s in speech recognition, and I’m proud to say ours works very accurately. Of course, shutting off alarms by voice commands or creating hands-free requests like “What time is it?” can be done as well.

I hope to see low-cost clocks for under $30 hit the market by the end of the year that incorporate Sensory’s chips featuring this awesome new technology. It’s REALLY COOL, and I’m REALLY EXCITED about it!

Todd
sensoryblog@sensoryinc.com

Robotic Speech

October 31st, 2007

robotLast weekend I helped my daughter Samantha create a Halloween costume. Actually it was 2 costumes, because she wanted one for her friend also. They wanted to be robots this year. I took a couple of old cardboard boxes, cut out holes for arms and legs, attached old circuit boards and switches to the sides, and put pieces of dryer vent hose into the arm holes. Then I painted the whole thing silver.

It looked pretty good…so good that my 4-year old son Sam put it on. His arms didn’t make it to the end of the makeshift sleeves and his head barely popped out the top, but he came walking into the kitchen wearing it and said in a monotonic ‘robot voice’: “I am a robot. I will destroy you.”

We all had a good laugh over that, but I wondered how he had learned what a robot sounds like and what they say. I guess that’s the power of the media. Interestingly though, the media has it all wrong. Speech output technologies even in their infancy never sounded like monotone robots.

Speech compression schemes digitize a real waveform and compress the data, which makes it increasingly unnatural and distorted as the compression rates drop, but it never becomes monotone as the inflections are still maintained. Likewise, approaches to TTS (text-to-speech) have never been robotic and monotonic. The early DecTalk and formant synthesis approaches sounded more like someone with an intoxicated Swedish accent than the traditional bot talk, and today, TTS and speech compression techniques sound close to perfect.

On the other hand, where the media has made speech output worse in robots, they have done the opposite for speech recognition. The media portrays robotic recognition as flawless. The Star Trek computer or the Lost in Space Robot never said “What did you say? I can’t understand, please repeat. Take me to a quieter environment.”

Speaking of robots…I just spoke at Robo Development 2007 and kicked off my speech by telling the story above. My favorite part of the show, however, wasn’t all the interesting people I met during my talk; it was walking through the exhibit space. I was very impressed with Hanson Robotic’s Zeno Robot. As I spoke with David Hanson, he looked over at my name badge and said “Oh Sensory, we’re using both your FluentSoft and your FluentChip technologies!”

It’s always fun when I’m not expecting it to meet a cool new application that uses Sensory technology.

Todd
sensoryblog@sensoryinc.com

Birds and Bots

October 19th, 2007

Oh, another Sensory based product that I love is Hasbro’s Squawkers Macaw. Actually, it’s not just the speech recognition that makes it so cool, it’s the way Hasbro combined a variety of speech technologies together to create a fun user experience. My 4-year old son Sam loves to have the bird record his voice and morph it back to sound like a parrot. Now Squawkers always says “You smell like a fart.” The other night my wife couldn’t sleep and she went out to the living room to read. She walked by Squawkers and he woke up (lot’s of built in sensors) and said something like “Hey what’s cooking?” She about had a heart attack. Sam put the batteries back in when she wasn’t looking.

I talk a lot about my kids in these blogs. They have been invaluable as Beta testers. They’re probably the first kids to grow up controlling household products by voice. I knew we had to work on improving performance in noise when several years back my son Max said “OK everyone be quiet, I want to use Radar”. Radar was a Fisher Price Robot product released back around 1996. Radar taught Max to say “I can’t hear you, what did you say?” I’m glad Sensory’s performance has improved over the years and my kids have gotten more sophisticated too. They ask “what’s the active vocabulary?” when I bring home a new toy…much easier than reading the instructions.

Todd
sensoryblog@sensoryinc.com

The First Voice Recognition Application for Bluetooth Devices

September 18th, 2007

I’ve been using my voice recognition Bluetooth headset. It’s nothing short of amazing. Sensory started porting its speech technology over to the CSR BC-5 bluetooth chip around 10 months ago, and today we’re formally announcing this industry first.

CSR is the leader in the Bluetooth chip space, and there have been close to 100 million CSR chips sold in 2007 into Bluetooth headsets. Implementing speech on the chip has not been easy as the platform is quite resource constrained, but that made it a perfect fit for Sensory’s expertise. We are the first company to run a speech recognizer on a Bluetooth platform, but I expect a lot of other companies will quickly follow in our footsteps. Sensory has developed not just the recognition technology, but a sample application too. It can do things like place calls, check battery level, enable pair mode, check connection status, and lots more. A voice prompt verbally confirms things back to the user, so it’s really easy. Watch this video to see how it works.

I don’t know how people have gotten by without this! The current generation of headsets use beeps and flashing lights to provide feedback, and input instructions are given by holding buttons down varying lengths of time…how clumsy and difficult! This is the perfect place for speech I/O since there’s no room for more buttons or a display AND it’s got a mic and speaker built in already.

Here’s some of my interesting usage experiences as a consumer:

1) While testing out the “call home” feature, my son didn’t get to the phone in time, so he called me back, and I happened to have my Sensory VR headset on when he called. A voice asked me “incoming call from 650 xxx-xxxx would you like to accept?” I said “yes”, and suddenly, quite magically I was connected to my eleven year-old son Max. I couldn’t stop raving to him how cool it was, and that he had made the first real call received by a VR headset. I had never tried that feature before! He said “Sensory’s voice recognition headset technology sounds really cool! I can’t wait for you to get me a cell phone so I can have a headset like that.” Nice try, Max.

2) I dropped the kids off at school the other day. Max’s backpack (with his homework) didn’t make it into the car. As I was driving through the campus I was able to call my wife and ask her to bring it over to the school. Without my voice dialing Bluetooth headset (eyes-free and phone-free usage), I wouldn’t have dared to reach over for the phone while I was driving around little kids; the call would have been delayed until I was off the campus. Safety is such a wonderful benefit!

3) I hooked up Goog411 into my voice dialing headset (if you haven’t tried it, try calling 1-800-goog-411). All I need to do is say “call Goog411” and it puts me into the Google Voice Server, from where I can quickly call any business by voice. As I was heading home from work I decided to look for an 18” kid’s bike tube. Rather than just stopping by Toy’s “R” Us, I used Goog411 to call in to see if they had them in stock. They didn’t. I then tried Orchard Supply Hardware. They didn’t, so I then called Target (they did). I never could have placed all these calls in time before. NOTE - It kind of seems silly that to use Goog411 you need to pick up a phone and hit 10 digits, and only then get to control things by voice. With a VR Bluetooth headset it so easy, and all voice controlled.

I really love this product.

Todd
sensoryblog@sensoryinc.com

Thanks, Bill & Steve!

August 29th, 2007

What better way to start my first Blog, than by complaining about the user interface in consumer electronics products. That’s actually why I started Sensory 12 years ago…to allow people to communicate with products the same way we communicate with each other.

Moore’s law has given us more power, more features, more data, in a smaller and smaller box. The problem is that consumer electronics have exponentially gotten more feature rich and capable during their relatively short lives, but the buttons, knobs, and switches we’ve used to access and control the data have barely changed…until recently.

The amazing success of products like the iPod and Wii have shown the world that consumers DO in fact want a new user interface for consumer electronics. These are great examples of how companies can hit it rich, not by going head to head in quality or cost, but changing the game by changing the user experience.

Bill Gates knows this too. Surface computing definitely is a recent and huge move in this direction, but Bill has had the right idea forever. He’s been preaching about humanizing the user interface and the potential merits of speech recognition since Sensory was started in 1994. In fact my very first business plan had a Bill Gates quote from the 4/92 Upside:

“…if you take anything that’s a human skill – speech, listening, handwriting, touch – it’s totally predictable that those are key technologies…that people should invest millions and millions of dollars in.”

Well, I guess I’ve taken Bill’s advice and so far it’s paying off. Sensory is doing quite well. Voice Signal Technologies who was a fellow player in the embedded speech market also took Bill’s advice and their acquisition by Nuance should close over the next couple weeks, paying off quite dearly for their employees and investors (by my calculations the $293M purchase price was around 8-9 times current year revenues). One last thought on Mr. Gates…he’s taken his own advice too and has built and acquired (Entropic and more recently TellMe) a sizeable speech recognition team.

It’s too bad that whenever Microsoft demos speech recognition it doesn’t work! If you don’t know what I’m talking about then run a video search for “Vista Speech Recognition”…and that wasn’t their first public humiliation over speech recognition! Microsoft actually does have an EXCELLENT speech engine and speech recognition team, and I certainly believe that there was a non-speech related bug during this demo, but never-the-less…OUCH!

OK…I’ve spent enough time on my first blog for now. Drop me an email or comment and let me know what you think (that’ll be my main data point as to whether I’m wasting my time or not).

Todd
sensoryblog@sensoryinc.com

Weapons for Christmas

August 29th, 2007

My seven year old daughter Sydney recently asked, “Daddy, when is Christmas?” I asked “Why?” She said (or more accurately, I heard) “Because I know what I want. I want weapons.”

I was a bit taken aback. But after some clarification I realized that I hadn’t heard her properly (more on her intent if you read on.) It’s always interesting to me when people hear things wrong. Humans have so many great clues about intent and context, yet we still occasionally get the wrong message. The best speech recognition systems actually try to take contextual probabilities into account. Dictation systems don’t just perform speech recognition, but get into “meaning” recognition as well.

I remember one system I read about from Bell Labs that included a camera to help improve accuracy by watching the speaker’s mouth and performing lip-reading. Humans utilize this approach too; I used to find it mildly amusing (back before I had my eyes lasered) to realize that when I took my contacts out I couldn’t always understand what people were saying. Too many years of playing in loud rock bands has damaged my hearing and I have learned to compensate by watching lips while I listen. The makers of the Jawbone Bluetooth headset have employed an interesting approach to noise reduction by “listening” in on the jawbone movements to help isolate the persons speaking from the background noises.

Okay…so what does my daughter want for Christmas? Webkinz, not weapons. Webkinz are the latest Virtual Pet toy craze. Virtual Pets have been around for a long time, but really exploded with Bandai’s 1997 hit Tamagotchi, which sold something like 40-50 million units. Tiger’s 1998 phenomenal hit Furby (which used a Sensory/TI SC chip in its original introduction and a Sensory RSC chip in it’s 2005 re-introduction) was a big enough sensation that Hasbro bought the company for over $300 million dollars.

The original Tamagotchi was a simple little virtual pet contained in a watch-like device with a small display. A few buttons enabled feeding, sleeping and other activities like exercise. The first Furby added mechanics to the mix by making a virtual creature that could move around and speak “Furbish”, while products like Sony’s Aibo and Furby II added more complex mechanics along with speech recognition. Webkins use the Internet to take one step forward in technology. Users can log onto their accounts and do various things to and with their pets, but the “pets” themselves are really a step backwards in simplicity. No mechanics, no speech recognizers, not much really but a ball of plush!

Nevertheless, the idea of products that interact with the Internet is big today and it will just get bigger. Even my four year old son goes onto the Internet to play games. Kids are growing up with big monitors, big memories, and powerful processors, and toy companies can make their products more powerful by taking advantage of this. I think more and more toy products will have online personas and the ability to download new gameplays, voices and recognition sets in the near future. In fact, watch out for a new chip from Sensory in 2008 that includes a USB port to make this kind of communication really easy. This is not a new idea for Sensory…some of our early patents made claims for this kind of stuff, and it’s really fun and exciting to see it all coming to life!

Todd
sensoryblog@sensoryinc.com