I am attempting this post with the Dragon naturally speaking voice recognition software. The software is designed for use with word processing packages and so doesn’t work as well with the rich text editor in WordPress under the browser.
I didn’t actually buy this software myself, but instead received it is a gift. I’ve seen the TV ads but I always considered it unlikely that it would work as well as the TV ads show. So I would not have purchased this myself. But I am quite surprised and quite impressed by how well it works.
I actually have some background in this. For a few years I worked for a company called Votan in Fremont California. The voice recognition software in that era required a large amount of extra hardware based on digital signal processor chips. The interface to the early IBM PC was quite crude. The voice recognition was entirely speaker dependent recognition, which meant you had to train every word you intended to use by speaking the word multiple times from which a matching template was created. Furthermore the technology only supported isolated word recognition which meant you had to speak each word with with a pause which is certainly unnatural. So frankly even though the Votan technology was the best it was awful.
My job was to manage the software on the PC side the interface to the voice recognition external hardware. Previous attempts of the software had produced complex and hard to use solutions. I learned about new technique for the original PC known as terminate-and-stay-resident, a crude form of multitasking. But it was a trick we could use to run our code in the background of some other application running on the PC. The next trick was to intercept the called DOS made to BIOS to get the next character from the keyboard. Thus after our speech recognition box detected a word it was sent to her TSR software and then translated into some combination of keystrokes including the control key. Thus we could train words and associate them with command used in some software package running at the same time. Primarily our focus was WordStar, since it was heavily driven by control key sequences. Of course the actual text input required using the keyboard since the vocabulary was so limited. So frankly it was pretty useless.
During the time Votan existed Dragon Systems was its primary competitor. Dragon’s speech recognition was inferior but their integration with the PC was better. As a result Dragon was able to get a contract with IBM and thus stayed business while Votan struggled.
it’s interesting though, that the Dragon software actually comes from a company called Nuance. in searching their website I find no evidence of a connection to the original Dragon Systems but I assume at some point the intellectual property and trademark were transferred to a later company. I did also find a reference to an attempted IPO by Votan in 1997. The IPO must have failed and I believe Votan no longer exists, but it would appeared neither does Dragon.
In order to use the NaturallySpeaking software it does require quite a bit of learning how to use it properly. For instance, how do you distinguish commands from the text input? How do you achieve special character inputs that are not easily recognizable such as proper names or unusual characters? Then how do you apply formatting such as bold? Nuance has to their software to work better with Microsoft Word, so for instance, I was unable to apply the bold formatting (above), but I imagine there is some way that would require me to study the software more thoroughly.
I also found that initial dictation is far easier than editing. For me this is somewhat unnatural because in writing blog posts I am so used to making changes via the keyboard that I don’t think through precisely what I’m going to say and often change my mind about what to say after I see the words on the screen. That style of composing doesn’t work well with this software, but if you can think ahead, especially in phrases, this software works amazingly well. It’s not the completely natural thing you see in science fiction shows but it’s far better than I would’ve imagined based on my early experience with this technology. So I salute all of those who have advanced this technology substantially in just a few decades as well as reducing it merely the software so that no specialized hardware is required.
I’ll finish now and merely point out that I did have to use the keyboard a few times to complete this post. I also carefully scanned what the recognition software inserted his text and tried to do immediate corrections. The reason one speaks in phrases is not a requirement to have pauses in your speech, but instead to provide a smaller unit of text to correct. So the easiest way to correct is simply to delete the last spoken phrase and repeat the phrase. This is done simply by saying delete last. There are many commands the software understands and learning those commands significantly improves the usability, so the other thing is that practice makes perfect, or another way of saying, practice improves the recognition and the speed with which one can compose a post.
While under best case and having had the practice to perfect using this software I believe it is possible to create a post more quickly through speaking and typing. But it’s very close and if many of mistakes occur then using speaking is probably slower. But an interesting point is how fatiguing speaking is in contrast to typing so I’m not really certain I would want to do a long post with this technology. But for people who have poor typing skills or actual challenges with using a keyboard this would be fantastic.