In a disturbing development, researchers have designed a new technique to produce AI deepfakes that only require entering in the text you want the person to say.
Disturbing New Technology Just Keeps Getting Better for No Justifiable Reason
Researchers have developed new AI deepfake software that can edit video to make a person say whatever you want simply by typing in what you want them to say. This is the next step forward in a technology that has no real positive social utility except to help film studios save money but is guaranteed to be misused by authoritarian governments around the world to discredit dissenting voices, generate revenge porn, and foment social conflict.
Scientists at Stanford University, Princeton University, the Max Planck Institute for Informatics, and Adobe Research have gone ahead and demonstrated a prima facie case for why people distrust technology by developing a new software platform that can allow you to edit the text transcript of someone's recorded speech to alter the video to say something they didn't actually say.
The technology works by matching a phoneme, the natural language processing term for different sounds, with a visime, the facial shape that those phonemes make when the person speaks a phoneme, and then using an AI to generate the new video based on a transcript of what the person said. Editing the transcript will then edit the video of the person to match the altered text.
According to the researchers, "Our text-based editing approach lays the foundation for better editing tools for movie post production. Filmed dialogue scenes often require re-timing or editing based on small script changes, which currently requires tedious manual work. Our editing technique also enables easy adaptation of audio-visual video content to specific target audiences: e.g., instruction videos can be fine-tuned to audiences of different backgrounds, or a storyteller video can be adapted to children of different age groups purely based on textual script edits. In short, our work was developed for storytelling purposes."
The researchers do acknowledge the potential for bad actors to use their technology for nefarious purposes and offer some potential solutions to this inevitable problem.
"Although methods for image and video manipulation are as old as the media themselves," the researchers write [PDF], "the risks of abuse are heightened when applied to a mode of communication that is sometimes considered to be authoritative evidence of thoughts and intents. We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals. We are concerned about such deception and misuse.
"Therefore," they continue, "we believe it is critical that video synthesized using our tool clearly presents itself as synthetic. The fact that the video is synthesized may be obvious by context (e.g. if the audience understands they are watching a fictional movie), directly stated in the video or signaled via watermarking. We also believe that it is essential to obtain permission from the performers for any alteration before sharing a resulting video with a broad audience. Finally, it is important that we as a community continue to develop forensics, fingerprinting and verification techniques (digital and non-digital) to identify manipulated video. Such safeguarding measures would reduce the potential for misuse while allowing creative uses of video editing technologies like ours."
Let's take these one at a time.
First, suggesting that deepfakes should clearly present themselves as such shows a gobsmackingly naive understanding of propaganda. Stalin would have found such admonitions quaint and so will every authoritarian government or political movement in the future that takes this technology to target dissidents, political opponents, and ethnic or religious minorities.
Second, if an AI can generate fake videos, an AI network can remove a watermark even easier. This is not a solution.
Third, consent to have your speech altered as a performer is certainly a positive in the film industry, but this means nothing if someone sets out to create an illicit forgery. Bad actors don't seek consent to create a fake celebrity or revenge porn, or to slander and tarnish others for an agenda.
Last, creating tools to detect forgeries is absolutely useless when it comes to propaganda. The Protocols of the Elders of Zion is a Tsarist forgery over a century old and has been discredited as a forgery for just as long, yet it is still used to successfully foment worldwide anti-semitism to this day. How long before clips start circulating online of George Soros talking about using the blood of Christian children for Jewish pastries? Are we actually supposed to believe that proving these videos are deepfakes is all that's needed to prevent these videos from causing unbelievable harm?
This Technology is Going to Get Entire Populations Killed
The tech industry is used to being worshipped as the saviors of our time and money, priding itself on an entrepreneurial ethos that borders on an almost Bioshock-level of objectivist-narcissism, but the cracks in their public image are starting to spread. Uber and Lyft drivers are nearing the point of open revolt over poverty wages. Facebook is desperately trying, laughably, to rebrand itself as a "privacy-focused network" even as it continues to spill out personal data into the world like a drunk at the bar trying to hold onto that one last pint of beer. Now, researchers working on AI deepfakes can say with a straight-face that the answer to the propagandistic misuse of their technology is a watermark or trusting other technology to save us from the dangers of the technology they are creating.
As the 2016 US Presidential election showed the world, low relative-cost information technology can have an outsized impact, and the propagandistic value of deepfakes throughout the world is horrifying. We don't have to wait to see this play out either; it may already be happening. A recent New Year's Eve video from the President of Gabon, Ali Bongo Ondimba, prompted a political crisis after some critics--including the country's military--challenged the video as a deepfake meant to hide the current condition of the president.
Whether the video is a deepfake or not is unclear and is honestly beside the point. The fact that this technology exists does damage either way. Fake videos can be used to discredit opposition figures, prop up failing leaders, or spread other misinformation; or, people can now dismiss video documentation of things they don't like as "deepfakes" even if they are real. Determining the authenticity of video will be as useful as fact-checkers have been in combatting the spread of fake news.
The researchers who developed this new software technology call for all kinds of safeguards that should be put in place to protect the world from the consequences of their technology, but while they're rushing to develop the technology to produce fake videos, no one seems to be all that interested in developing those safeguards.
Instead, the researchers behind this latest software are essentially trusting that the population of the world--most of which has various levels of control imposed upon their information sources--will discern for themselves whether the video they just saw on government television was a deepfake or not. They can't be bothered to do this themselves, apparently.
The consequences of this technology cannot be understated. In 1994, an airplane carrying the President of Rwanda was shot down, killing all onboard. The Hutu government of Rwanda blamed a Tutsi rebel army for the attack, precipitating a hundred-day-long genocide of Tutsi civilians that claimed nearly a million lives. The variousconflicts in the Balkans have cost over a hundred thousand lives since the break-up of Yugoslavia and governments have long used misinformation and outright fabrications to incite pogroms against Jews and other persecuted minorities throughout human history.
Tensions such as these exist all over the world right now, waiting for a flashpoint to ignite these tensions into war and genocide. A deepfake of a prominent religious- or ethnic-minority leader or figure "confessing" to some horrific crime would be more solid "justification" to incite a genocide than past genocides have relied on. To this day, no one knows who shot down the Rwandan President's plane, but all it took was the forceful accusation to get the slaughter rolling.
Not all situations are 1994 Rwanda, sometimes an accusation isn't enough--but who needs an accusation when you can have a "confession"? There is nothing stopping a deepfake from becoming the 21st century's Protocols of the Elders of Zion--in fact, its guaranteed that such a forgery is now inevitable and people will surely die because of it. Lots of people.
In his Mother Jones story about the Gabonese deepfake controversy, Ali Breland writes: "[w]hile most media coverage of deepfakes has focused on horror scenarios of them being used against the U.S. and other western countries, experts warn that deepfakes could wreak the most havoc in developing countries, which are often home to fragile governments and populations with nascent digital literacy. Rather than, perhaps, a fake video of Amazon CEO Jeff Bezos announcing retirement, triggering a stock dive, misinformation in some countries could lead to coups, gender or ethnically motivated violence, threatening the stability of entire states."
In a recent piece in the New York Times, Dr. Regina Rini, a philosophy instructor at York University in Toronto, believes that it is already too late to turn back and that we must prepare ourselves to treat video more like witness testimony then documentary evidence. "[I]t’s clear that current arguments about fake news are only a taste of what will happen when sounds and images, not just words, are open to manipulation by anyone with a decent computer."
Had the people involved in producing this technology received proper instruction in ethics--or even just history--, this technology would never have left the proposal stage and anyone who pushed for it would have been fired by managers who had received a proper ethical education. Since no one in authority at the companies and institutions that have developed this technology seem to think that such considerations were concerning enough to prevent this technology being built, it might be time for society to step in and rein these people in with regulation.
The researchers in question seem to appreciate this point themselves, when they write "we believe that a robust public conversation is necessary to create a set of appropriate regulations and laws that would balance the risks of misuse of these tools against the importance of creative, consensual use cases." It should never have gotten this far, however, since the "creative, consensual use cases" diminish in significance next to the enormous damage that the "misuse" of their technology will cause. Weighing one against the other, any reasonable person should have been able to conclude that under no circumstance does a benefit of this technology to industry justify the harm this technology will cause to society. It seems the tech industry is bereft of reasonable people, however, since they clearly cannot help themselves whenever they see some other thread of the social fabric they can "disrupt".
Already, this technology will mean that we can no longer trust what we see with our own eyes. One could argue that we should always be vigilant against misinformation, but this too is naive. Who wants to spend every waking moment reading, viewing, or digesting information with this kind of hard-nosed skepticism, all the time? This requires a level of education and mental energy that not everyone possesses and that nobody can sustain forever. Society is built, ultimately, on a certain level of trust. This technology simply destroys some of that hard-won trust in each other that cannot be easily repaired, if at all.
But hey, if you've got a minor script edit and you need to reshoot 15 seconds of dialogue, this technology is totally worth the social cost.