Learning AI #18

Is voice cloning for real?

Voice cloning is exactly what it sounds like. You feed an AI a few minutes of someone's voice and it learns to generate new speech that sounds like that person. Give it a script and it will read it in the person’s cadences and tone. The output is often indistinguishable from the real thing (more on this below).

You can do this right now with off-the-shelf tools like ElevenLabs, Murf, and Descript. Some require you to agree to terms prohibiting impersonation. Some don't ask many questions.

Podcasters use AI voices to fix verbal stumbles without re-recording. Authors narrate their own audiobooks without sitting in a studio for 20 hours. (I am using it for my soon-to-be released book on AI.)

People who have lost their voice can still communicate in something that sounds like them. A woman named Val Patterson, whose ALS had taken her speech, used a company called Lyrebird to preserve her voice before she lost it entirely. That's a good use.

Voice cloning has been around for a while, but recent AI advances have made it better and more affordable.

A bad use? In January 2024, voters in New Hampshire received a robocall the night before the Democratic primary. The voice was Joe Biden's. It told them their vote would be wasted in the primary and they should save it for November. It was fabricated. A political consultant named Steve Kramer later claimed responsibility, describing it as an effort to "educate" people about AI.

The FCC fined the company behind it $6 million, which works out to roughly $0.60 per call. Several states passed new laws within months. Congress held hearings but nothing much has changed.

The Biden voice was close enough that many people believed it was real.

Scammers are using the same approach on a smaller scale, and with more precision. There are documented cases of people receiving calls that sound like a family member in an emergency, asking for money, specific enough to be convincing. The voices were generated from audio scraped off social media. Thirty seconds of source material is enough to work with.

The ethical question is not complicated to state. Clone your own voice: fine. Clone someone's voice with their permission: fine. Clone someone's voice to deceive anyone: fraud. The tools are cheap, the barrier is low, and most platforms are not moving fast to police misuse.

A few things worth doing now. Set a verbal safe word with close family members to use in emergencies; a cloned voice won't know it. If you've posted videos or audio publicly, your voice is already available to anyone who wants it.

Voice cloning is a real technology with real applications. It's also in the hands of people who will use it badly. In some cases, they already have.

Now we have a test for you. At the bottom of this post are two of my voice samples. One is me and one is my AI-cloned voice. I spent about a half hour with the voice-cloning technology to produce my synthetic voice, which can be used for thousands of hours of recordings for the next 100+ years.

I spent almost as much time on my live recording trying to get the 17-second clip to be clean without verbal stumbles or, as they say in the audio trade, “mouth sounds.”

Listen to both samples ONCE each. Then vote on which you think is the AI-generated voice. After you vote, listen as many times as you like. If you can’t get enough of my voice, there will be four hours of it in the audiobook version of my AI book coming out in the next few weeks. Don’t worry, you will hear about it relentlessly in these pages.

Coming soon!

Take the test and vote below

Which recording is my AI voice clone?

Login or Subscribe to participate in polls.

Things I think about

A tsunami can travel across the ocean at the speed of a jet airplane, up to 500 mph.

**********