The Robots are Coming.

Nic Redman and Leah Marks, the formidable team behind The VO Social, spend hours of their free time working to keep the voiceover community connected from our individual voice booths, as well as presenting a brilliant podcast. In their latest episode, they discuss the rise of AI voices, and how we – as living, breathing, human voice artists – may soon find ourselves surplus to requirements.

We won’t. Here’s why.

Nic and Leah spoke to the artist behind Faith, the first AI voice that can simulate emotion. While the technology is quite impressive (and certainly a step forward from the disjointed delivery of other text-to-speech voices on the market) it was interesting to hear how much work had gone into developing just one single voice, and for it still to sound – as Leah put it – “computery”. I agree. Faith had recorded hours of material, with varying emotions and inflexions. Of course, now that it’s recorded she may well have voiced herself out of a job – while those VOs who don’t voice for AI companies remain unique – but no doubt they plan to roll out this technology with a huge range of vocal styles and accents.

What made this episode interesting and engaging was that they were speaking to an actual person about the work she had done. Had they interviewed the technology, we’d have all switched off. It’s fine to get Alexa to read back your shopping list, but she’s never been in your kitchen for company. She serves a purpose (or interrupts your arguments, if, like me, you have a belligerent teenager called Alex). Interviewing Alexa or Siri would make for an incredibly boring half hour… and that engagement is exactly why eLearning has to, and will, remain human.

Different generations speak in different ways. Pathé-style newsreading has gone, as has the fashion (which began about 15 years ago) for cool female voiceovers to sound disinterested and almost pissed off. (Thank God. I never quite nailed that one.) The key to a successful career in voiceover is not to corner the market in one particular trend… it’s actually to keep adapting, while always retaining the voice which makes you unique. AI would have to follow these trends, and adapt to them as readily as the humans already creating them. Teenagers, in particular, play an important role in the evolution of speech and language (I should know – I have a houseful of them) and you only need to ask one how they feel about the A Level and GCSE grading debacle to understand why algorithms are not always our friend.

The companies we work for value a personal relationship with us. Since lockdown, we’ve been busier than ever, because people still need to learn – but it’s more important than ever to help them do so from home in an imaginative way. Our clients value the creative process that goes into the work they do; they value the time they take with us in discussing who would be right for that particular narration. They value that these real voices, in a variety of accents and styles, can impart really quite important information, and they value the fact that there’s someone here overseeing the whole production process. Some of this information really is a matter of life and death. An AI voice can’t engage a user for long – and they certainly can’t correct any mistakes in the script. I don’t think I’ve ever seen a script which doesn’t have a small mistake in it somewhere. An AI voice wouldn’t correct it. An intelligent human voice would offer an alternative and explain why. I’ve had a smartphone for years and it STILL doesn’t get autocorrect, er, correct – so why on earth would an AI voice know the right alternative words to say in a highly specialised medical or technical narration? We would, because we know this stuff inside out. Or we’d ask for clarification. In nearly 20 years of recording audio, nobody has ever asked us for a computer-generated voice. They’ve asked us several times to replace one, though.

Voiceovers have put people out of a job. My husband has been dead for four years but he’s still announcing trains on the London Underground. I’ve recorded countless on-hold announcements which negate the need for a receptionist. I’ve even recorded sentences like “I’m sorry, I didn’t catch that” for an automated phone line. Of course I didn’t catch anything – I’m not actually there. Every generation, every industry, has been responsible for moving forward by automating certain processes – but no industry thrives by having nobody at all behind the scenes. There are still jobs… just different ones. And with eLearning, we could be putting teachers out of a job. But we’re not. We’re enhancing their ability to teach.

AI will not take away everyone’s job. I do believe that there will be far less work for people who don’t work in voiceover full time, and those are the people who will need to adapt and potentially reskill. The real victims of AI will be the bit part players and part timers. Voices working from poor quality studios at low rates are struggling to make a living NOW. In a year or two, technology will make them surplus to requirements. This is not meant to be disrespectful to those people who DO work for low rates from poor quality studios – there’s nothing to stop them from doing what they’re doing and I wish them the best of luck. But they’re not competing with me, and they never have done; they’re competing in a very different market. I’m not convinced that their market will be around in the next two or three years, though, because the robots will take their jobs.

The eLearning market is growing. I believe there’s a place for AI voices to work alongside real voices; to add a little colour to different scenarios here and there, as visuals become more elaborate. The ongoing gamification of eLearning is bringing far more exciting learning material to market, and new technology makes this possible – but the role of the narrator is still essential, and totally human. I don’t think we can (or should) try to stop AI voices from entering our workplace – we just need to know where they fit and how to use them effectively and efficiently. Artifical Intelligence is here to stay, but it’s not going to replace human intelligence. Nor will it replace human contact, human emotion, or the ability of a real, live, educated, engaging human to convey a really important message. Have faith.