Dragon naturally speaking

I am attempting this post with the Dragon naturally speaking voice recognition software. The software is designed for use with word processing packages and so doesn’t work as well with the rich text editor in WordPress under the browser.

I didn’t actually buy this software myself, but instead received it is a gift. I’ve seen the TV ads but I always considered it unlikely that it would work as well as the TV ads show. So I would not have purchased this myself. But I am quite surprised and quite impressed by how well it works.

I actually have some background in this. For a few years I worked for a company called Votan in Fremont California. The voice recognition software in that era required a large amount of extra hardware based on digital signal processor chips. The interface to the early IBM PC was quite crude. The voice recognition was entirely speaker dependent recognition, which meant you had to train every word you intended to use by speaking the word multiple times from which a matching template was created. Furthermore the technology only supported isolated word recognition which meant you had to speak each word with with a pause which is certainly unnatural. So frankly even though the Votan technology was the best it was awful.

My job was to manage the software on the PC side the interface to the voice recognition external hardware. Previous attempts of the software had produced complex and hard to use solutions. I learned about new technique for the original PC known as terminate-and-stay-resident, a crude form of multitasking. But it was a trick we could use to run our code in the background of some other application running on the PC. The next trick was to intercept the called DOS made to BIOS to get the next character from the keyboard. Thus after our speech recognition box detected a word it was sent to her TSR software and then translated into some combination of keystrokes including the control key. Thus we could train words and associate them with command used in some software package running at the same time. Primarily our focus was WordStar, since it was heavily driven by control key sequences. Of course the actual text input required using the keyboard since the vocabulary was so limited. So frankly it was pretty useless.

During the time Votan existed Dragon Systems was its primary competitor. Dragon’s speech recognition was inferior but their integration with the PC was better. As a result Dragon was able to get a contract with IBM and thus stayed business while Votan struggled.

it’s interesting though, that the Dragon software actually comes from a company called Nuance. in searching their website I find no evidence of a connection to the original Dragon Systems but I assume at some point the intellectual property and trademark were transferred to a later company. I did also find a reference to an attempted IPO by Votan in 1997. The IPO must have failed and I believe Votan no longer exists, but it would appeared neither does Dragon.

In order to use the NaturallySpeaking software it does require quite a bit of learning how to use it properly. For instance, how do you distinguish commands from the text input? How do you achieve special character inputs that are not easily recognizable such as proper names or unusual characters? Then how do you apply formatting such as bold? Nuance has to their software to work better with Microsoft Word, so for instance, I was unable to apply the bold formatting (above), but I imagine there is some way that would require me to study the software more thoroughly.

I also found that initial dictation is far easier than editing. For me this is somewhat unnatural because in writing blog posts I am so used to making changes via the keyboard that I don’t think through precisely what I’m going to say and often change my mind about what to say after I see the words on the screen. That style of composing doesn’t work well with this software, but if you can think ahead, especially in phrases, this software works amazingly well. It’s not the completely natural thing you see in science fiction shows but it’s far better than I would’ve imagined based on my early experience with this technology. So I salute all of those who have advanced this technology substantially in just a few decades as well as reducing it merely the software so that no specialized hardware is required.

I’ll finish now and merely point out that I did have to use the keyboard a few times to complete this post. I also carefully scanned what the recognition software inserted his text and tried to do immediate corrections. The reason one speaks in phrases is not a requirement to have pauses in your speech, but instead to provide a smaller unit of text to correct. So the easiest way to correct is simply to delete the last spoken phrase and repeat the phrase. This is done simply by saying delete last. There are many commands the software understands and learning those commands significantly improves the usability, so the other thing is that practice makes perfect, or another way of saying, practice improves the recognition and the speed with which one can compose a post.

While under best case and having had the practice to perfect using this software I believe it is possible to create a post more quickly through speaking and typing. But it’s very close and if many of mistakes occur then using speaking is probably slower. But an interesting point is how fatiguing speaking is in contrast to typing so I’m not really certain I would want to do a long post with this technology. But for people who have poor typing skills or actual challenges with using a keyboard this would be fantastic.

Advertisements

About dmill96

old fat (but now getting trim and fit) guy, who used to create software in Silicon Valley (almost before it was called that), who used to go backpacking and bicycling and cross-country skiing and now geodashes, drives AWD in Wyoming, takes pictures, and writes long blog posts and does xizquvjyk.
This entry was posted in comment and tagged , . Bookmark the permalink.

12 Responses to Dragon naturally speaking

  1. Keyboarding is essentially a silent activity allowing thoughts to flow; therefore, I can imagine that speaking well enough for the software to ‘record’ properly would be a bit exhausting. Some days I feel all I do is talk so at night, it’s so very nice doing something in silence. Thanks for sharing your experience!

    • dmill96 says:

      And as keyboarding is silent it is compatible with writing when one is not alone.

      In fact, studies were done back when I was working in voice recognition that indicated it would be an ear-popping cacophony with everyone speaking to their computers in the workplace.

      It’s also interesting how fatiguing it is but it definitely requires more mindful attention (I can type on semi-autopilot but the voice requires constant attention).

  2. Better keep to your keyboard when you’re at Starbucks…

    I don’t know the sequence of events, but Nuance became dominant around 2000 when voice portals started to appear. TellMe, founded by ex-Netscape developers, HeyAnita, BeVocal, and others all popped up at about the same time, and I think they were all powered by Nuance. Many of these were clients of the mapping company (oops, Location-based Services) I worked for then. TellMe was probably the best known, and was acquired by Microsoft. What they did with the technology is hard to fathom.

    Do you remember that AT&T’s Bell Labs (the real AT&T, not the Texas version) took the entire ClariNet newsfeed? They were using it for language research, and one of the offshoots should have been speech recognition. It’s also surprising that Google hasn’t played a stronger role in this, since they analyze emails and voice mails (on Google Voice – but their transcriptions are horrible).

    What’s so surprising about all this is that these things have gotten so accurate. Siri is ungodly – I find its voice recognition almost totally accurate, even in a noisy background. But for all of us who thought that voice recognition would never work, it has become practical, useful, and may one day replace keyboards. You and I might have an easier time typing, but CPAs also had an easier time with paper spreadsheets. I guess the next step is to figure out what we want to say without even requiring speech.

    • dmill96 says:

      Don’t get me started with AT&T. I wasted several days there while at Votan trying to debug their stupid PC-AT clone that violated a number of IBM’s specs. Votan was curious, with Bell Labs as a supposed leader in voice recognition why AT&T would do a deal with Votan. The general belief was either trying to suppress us or reverse engineer the technology. AT&T never accomplished anything in this area.

      Siri is actually an easier problem. Decades ago voice recognition worked much better with simple query and/or command environments (it’s not even necessary to recognize all the words, just the important words). It’s a bit like the language translation that works well for simple declarative sentences. I think Dragon is way more impressive because it has to get every word (it can’t just guess or ignore) and it has to work with a huge vocabulary (impressive how large it is) and it has to work with complex prose sentences. Decades ago Siri type products were “easy” but viewed as having limited market. The “voice typewriter” was the holy grail that drove everyone and everybody’s product were measured relative to this imaginary standard.

      And to your first point I can’t imagine using this anywhere except completely alone. It would really irritate people at Starbucks if I dictated posts and it would expose too much in other places with people listening.

      • You know, I suppose this might be true, but wouldn’t they be just as ‘rude’ by talking on their cell phones and conversing loudly themselves? I have to admit, I don’t like coffee so I don’t really know what goes on at a Starbucks! Just call me an ‘ignorant upper Midwesterner’… πŸ˜€

        • dmill96 says:

          That was the subject of some of my “Impact of wired world” posts where I defend texting. I admit I find it irritating when people are texting in my presence, BUT, it’s better than the alternative. Just at the time I started working again and thus flying a lot texting was still relatively uncommon but various smarter (not yet smart) phones were commonplace esp. with wireless headsets. This drove me crazy, in the waiting routine for the flight, constant buzz of one side of a conversation, on the plane until devices off, even more buzz, closer (imagine the middle seat) and the trivia that people were talking (like they had to call tons of people to tell them any time something happened on the flight process, “Just wanted to tell you I’m on the plane now” literally). Then I didn’t even know airlines had changed the rules, literally the second wheels hit runway phones were allowed. Now everyone is calling everyone they called before leaving and telling them they’d arrive. Then when the doors finally open and we’re getting out, they called again. It drove me crazy.

          So they still do all this, but now it’s texting instead so at least I don’t have to hear it.

          Relatively few people take voice calls while in Starbucks but lots of people are texting. Once I saw this very cute and precocious little girl, playing games on an iPad, texting on some phone, but also had a smaller phone where she talked several times. In this case it was the iPad game that was irritating because she was talking away to the game and it was talking back somehow. She was having a great time but I was glad when her parents decided she’d hit her limit and made her turn it off, which she did, but then put in her earbuds and continued to play with three different devices – talk about wired.

        • dmill96 says:

          I see a smile (duh, obvious) Is this what you’re talking about? [still mystified]

  3. dmill96 says:

    btw: The OOTB Dragon comes with no swear words so you’ll have to custom train those. The list of known proper names is amazingly large.

  4. Pingback: Dragon Naturally Speaking CRASHES Often! With Data Loss | dailydouq

  5. Pingback: What is the point of blogging? | dailydouq

  6. I was watching old episodes of Computer Chronicles and came across this episode:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s