Yes, Deepfake Audio is Now a Thing

Well, it's official. I am throwing in the towel. My worldview has been shaken, stirred, crumbled, kneaded, blown apart, and reduced to its elemental atoms. Actually, at this point I'm down to the subatomic level. I think I just saw a quark go by.

One of my favorite childhood toys was an analog computer that taught Boolean Algebra. Starting with that toy, I began the heavy lifting of learning to think like an engineer. Then I thought like that for 40 years. Mine is a world of square roots and right angles. Add a pinch of OCD and you know how I butter my bread (precisely, with the appropriate amount of physics). In my mind, the world should be taken at face value. Except when it shouldn't. Which brings us to deepfake audio.

We are familiar with deepfake video. Somehow I accepted that, and considered those images as really clever parlor tricks. Now along comes the voices of deepfake audio. It has shaken me. Probably because audio is my passion, and in my mind, audio technology is a machine perfectly constructed to always strive toward fidelity. I just can't compute that now it's striving for, and achieving, deception.

For example, this Vocal Synthesis YouTube channel has a recording of six U.S. Presidents introducing the channel. Of course, it's not the Presidents. It's deepfake audio synthesis of them saying things they never said. Need more convincing? The site has lots of examples. Listen to this recitative duet by Donald Trump and Alexandria Ocasio-Cortez.

Now, these examples are purposefully outrageous. No one would believe them. But what if the program was used to produce audio that was just outrageous enough, designed to go viral? Going viral can be a fast and powerful thing. I refer you to the case of Justine Sacco. She tweeted to her 170 followers, then hopped on a plane for an 11-hour flight. Her tweet went viral, trending to #1. By the time her plane landed, unbeknownst to her, the social media world was tracking her flight, and she had been fired from her job. Deepfake audio would only accelerate the trending speed.

With a little luck, one could sow chaos, and maybe even bring down a government. But even if the intentions are lesser, deepfake audio has lots of nefarious potential. Consider musicians. You can copyright recordings of your voice, but I don't think you can copyright your voice. For example, Adele owns her name, likeness and recordings, but does she own her voice? If the synthesis software is trained with hours of copyrighted recordings of her voice, would the resulting product violate that copyright, or would it be a novel work?

Even if voice impersonation of public figures is illegal, crooks could use deepfake audio to construct songs apparently sung by Adele, marketing them as “private” or “lost” recordings. If those songs started trending, they could rake in millions of bucks before anyone was the wiser. When detected, they pull down that website, and create another trove of famous-name recordings.

Also, voice is now a potential data breach. My bank used a voice authentication feature; I could access my account via telephone by saying the phrase “my voice is my password.” It's easy to get enough voice samples of Mr. Trump's voice to train a synthesis program, and much more difficult to get samples of my voice, but I'm sure not using a voice log-on anymore. I guess we'll need to develop equally powerful software on the listening end that can differentiate fake voices from real voices. Honestly, I don't know what to believe any more. Is all content, sound and vision, now suspect?

When you start with Boolean Algebra, and develop it sufficiently, you're eventually able to write AI software that inputs a text and outputs the voice of Barack Obama. There's nothing illogical about that, and in fact the deception is a triumph of logic.

I'll need to think about that. But in my world of right angles, I'm not sure I'll be able to wrap my mind around it.

COMMENTS
Bosshog7_2000's picture

Deepfake audio/video is the beginning of the end of modern civilization. The past few years have already shown that a huge percentage of Americans lack any level of discernment to distinguish fact from fiction, hence the rise of sites like Infowars that prey on people's stupidity. Add in AI technology such as deepfake audio and the possible ramifications are alarming. This technology needs to be regulated and those who abuse it should be harshly punished.

larrymartin's picture

The seismic shift from analog to deepfake audio is a surreal journey. As we grapple with the implications on everything from artistry to security the line between genuine and fabricated voices blurs leaving us to navigate a disorienting landscape of sonic manipulation.
Tempered Glass Services in Ludlow VT

X