Here We Go Again (Say What?)


Voice recognition is getting a lot of attention right now, and I'm beginning to wonder if the folks marketing this technology really know what they're doing. IBM's gala introduction of OS/2 Warp 4, for example, started with someone striding onto the stage and speaking to the computer. Even in this highly scripted and (I hope) carefully rehearsed scenario, the recognition was patchy -- some commands were ignored, others misinterpreted.

A few years ago, I had an opportunity to develop software for some pen-based "tablet" computers. It was fun and challenging work, and the experience was most instructive. I learned some important new programming ideas; multithreaded, object-oriented systems were the rule, not the exception. But most importantly, I learned not to rely too heavily on machine recognition -- and I don't think I'm the only one who learned this lesson.

Even as the pen-computing market was crumbling, a second generation of pen-based software was being released. The user interface differences were impressive. First-generation software relied heavily on handwriting recognition; many programs required you to set numeric options by writing them in. Messaging and e-mail applications attempted to translate handwriting into text before it was sent. Second-generation software took a different approach, relying on direct interaction (such as drag-and-drop) and context-sensitive menus rather than gestures. It also minimized the number of options and generally simplified the UI, something today's desktop software could well stand. Most importantly, programmers made extensive use of ink as a data type. Let's face it, humans read handwriting much better than computers ever will. If you can, you should simply store and manipulate the ink directly rather than slowly and incorrectly translate it into text.

The same concern applies to voice. During his demonstration, the IBM spokesperson dictated an e-mail message, which was dutifully converted into text, then sent. During this particular part, I wondered why the e-mail application didn't simply record and send the speech directly. MIME e-mail is widespread and includes a standard audio format that's well suited to voice. As a user, I find "voice e-mail" much more attractive than the prospect of wrestling with a voice-recognition system that requires you to speak...slowly...and...distinctly, yet even then makes mistakes.

Voice is a tremendously versatile data type, and voice recognition certainly has its place. But people who speak for voice recognition as the primary feature of a new operating system or application may be doing themselves a disservice. Like the people who were marketing pen systems a few years ago, they just might be drawing attention to the weakest part of their product.

--Tim Kientzle