Reaction to my blog post about new advances in synthetic speech (Synthetic Voice: Revolutionary or Repugnant) was mixed, but almost all agreed the sample voice on the Loquendo site was better than any had heard before.
Before I get into some of the responses, let me refer you to yet another site of this sort I’ve since become aware of: Lessac Technologies.
Click HERE to go to their FTP site with a long list of sample audio files.
Right. I didn’t think they were as good as the Loquendo site either, but certainly understandable. As these technologies improve, a myriad of creative, market, and technology questions arise.
My friend Brett Bumeter, the man who helped me build this WordPress blog you’re reading right now, and a self-avowed student of artificial speech… had this to say about acceptance of this level of quality as it relates to price-point:
“I think the pricing of the platform is the thing that essentially secures a temporary safe place for voice actors. The price points for using an emotional voice are not all that different from lower end price points for voice actors. Voice actors on the higher end . . . well . . . they are on the higher end sometimes for a reason.
That said, I think the writing is on the wall, or at least on my computer screen, this will definitely be a force to contend with in the next 5 years.
If that price point comes down by half or if the quality goes up by another 20% or the application to convert text to voice becomes even easier (thus saving time for the ‘producer’ and saving money that way), then voice acting price arbitrage will open up to synthesized voices for sure.”
Fellow voice-actor Peter Drew made these observations:
“Audiobook producers will certainly hold out longer against using synthetic voices. The more immediate concern is the industrial/corporation market and retail marketing on the Web via video. Why pay someone to read an already dull script to accompany a human resources video on the latest changes to the company’s benefits package? With hi-def hand-held cameras and desktop video production, many small retailers can make their own videos or contract a local agency to crank out a video for little money, saving even more by using a “voice in a box.”
It’s not a matter of if but when. Technology marches on and the human voice will be synthesized to a relatively high degree of realism and natural character. Will a synthetic voice ever match the artistry and subtlety of a well-trained and experienced actor? We’ll just have to
wait and see…”
Peter also referred me to a previous article he had written about this.
Voice over artist Daniel Wallace said:
“After listening to several of the samples I was shocked at how close the voices were to the real thing. The artificial voices are as close as I have ever heard. It also concerns me as a narrator. As the artificial voices get closer and closer to human voices, will publishers for the sake of expediency turn to this technology in place of a human narrator? I do believe no matter how close a computer generated voice gets one cannot replace artistry.”
“While artificial intelligence and computer technology may eventually be able to mimic the human voice flawlessly, it will never be able to mimic the human imagination or insight that comes from living one’s life and experiencing our varied environments and relationships. No software will ever be able to adequately convey the emotional connection acquired when holding a crying newborn peeing all over your shirtfront, or the emotional letdown earned when failing to revive a drunken auto accident victim using CPR. All that said, non-fiction or text/reference books, as well as automated style text, may end up relying on computer voices.”
My friend Steve Hammill offered yet another take on why synthesized voices may or may not succeed:
“There’ll still be a market for live V/O, but synthetic voices will be a real threat to lesser talent very soon. And I have a theory about which markets synthetic voices will hit hard. IMO, short form work will get it first. My reasoning for this is that the human ear/brain will be annoyed by a computer voice in long form. My only “proof” of this is in testing mic preamps. In :30 second A:B comparisons, the differences between preamps were nearly impossible to hear; in long form the differences in preamps became dramatic. My theory is that the same will hold true for synthetic voices. Commercial tags, :10 VOs and other vocal bits (…dare I say imaging which is pretty synthetic already…) will be food for bottom feeders because there won’t be enough synthetic voice to offend the ears of most people.”
Finally, my VO friend Bobbin Beam did not mince her words when she commented:
“I can’t believe that someday someone would pay somebody to take the time to try and articulate just the right nuance of read that can out of one person, which captures heart, brain and vocal instrument for every given piece of copy, and marketplace trend. Hell they can hire an actual voice actor for that, and most probably invest a lot less in the long run!”
Please feel free to chime in on the conversation by commenting below.
It’s clear software engineers and audio creators are not going to rest on what they’ve done so far. As artificial intelligence algorithms improve, my guess is we’ll hear synthetic voices that rise to the level of chess-playing software in their ability to innovate, learn, and approximate human nuances. That’s when market forces will determine whether it’s worth the cost to customers.