Speech-to-Text

by | Aug 28, 2012 | Technology, Web Resources, Web/Tech

No…not Text-to-Speech.

Speech-to-Text.

In the newsroom, the most hated, vilified, time-wasting, and tedious duty is manually logging interviews done in the field.  We have desks with playback machines, a computer, and a video monitor that displays a time-code. ..but a human being is still the main interface ingredient to make the translation.

The reporter (or hapless intern) sits with headphones on, and listens to the interview subject, then types into the computer the verbatim words.  Some people still use pen and paper.  I’m amazed that no one has yet figured out a way to automate this task. They could make a mint.

Now, mind you…Dragon Naturally Speaking — as good as it has become, still does not do a very good job of interpreting the dialects, slang-words, intonations, pacing, and nuances of human speech, especially with the noisy ambient sound that comes with most interviews.  DNS is basically a highly advanced, and relatively inexpensive version of artificial intelligence.  But much like the captcha visual verification of a “human” visitor commenting, computers still just can’t seem to quite grasp the unbelievably complex verbal cues that make up our language.

As a voice-artist, you should count this as a plus.  We’re darned hard to replace.  Interpreting the author’s intent is still miles off in the realm of computer achievement.

I’ve had limited success sitting, listening to the interview, and speaking it back into Dragon Naturally Speaking for a printed transcript…but that’s an acquired taste.

And the bottom line is…that all important time-code MUST accompany each sentence.  Kinda like the numbered verses and chapters in the Bible.  The editor HAS to know where that  sound-bite is when putting the story together.  Dragon Naturally speaking doesn’t do any of that.

But programmers keep trying.  The lastest iPad, iPhone, and Android operating systems and devices all offer the microphone icon on most pages where input is required, and their interpolation is quite good.

One of the more recent web offerings is quite handy and innovative, too.

Online Dictation uses the x-webkit-speech attribute of HTML5 that is only implemented in Google Chrome.  I’m not sure exactly what that means, except to say this online offering only works with the Google Chrome browser.

The website explains it pretty well, and so does this blog article:  http://paulhami.edublogs.org/2012/08/16/online-dictation-another-effective-voice-recognition-option-in-google-chrome/

Much like Dragon Naturally Speaking (and other speech-to-text programs), the drawback is that your colleagues, spouse, or nearby “eavesdroppers” may tire of hearing you dictate messages they don’t necessarily want to hear.

CourVO

Comments

comments

Share This