{"id":14693,"date":"2024-05-01T10:03:34","date_gmt":"2024-05-01T13:03:34","guid":{"rendered":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=14693"},"modified":"2024-05-01T10:03:34","modified_gmt":"2024-05-01T13:03:34","slug":"una-startup-de-inteligencia-artificial-hizo-un-deepfake-hiperrealista-que-es-tan-bueno-que-da-miedo","status":"publish","type":"post","link":"https:\/\/www.fie.undef.edu.ar\/ceptm\/?p=14693","title":{"rendered":"Una startup de inteligencia artificial hizo un deepfake hiperrealista que es tan bueno que da miedo"},"content":{"rendered":"<p>La nueva tecnolog\u00eda de Synthesia es impresionante, pero plantea grandes preguntas sobre un mundo en el que cada vez m\u00e1s no podemos saber qu\u00e9 es real.<\/p>\n<hr \/>\n<p><strong>I\u2019m stressed and running late, because what do you wear for the rest of eternity?\u00a0<\/strong><\/p>\n<p>This makes it sound like I\u2019m dying, but it\u2019s the opposite. I am, in a way, about to live forever, thanks to the AI video startup Synthesia. For the past several years, the company has produced AI-generated avatars, but today it\u00a0<a href=\"https:\/\/www.synthesia.io\/post\/expressive-avatars-powered-by-synthesias-new-express1-model-are-here\" target=\"_blank\" rel=\"noopener\">launches<\/a>\u00a0a new generation, its first to take advantage of the latest advancements in generative AI, and they are more realistic and expressive than anything I\u2019ve ever seen. While today\u2019s release means almost anyone will now be able to make a digital double, on this early April afternoon, before the technology goes public, they\u2019ve agreed to make one of me.<\/p>\n<p>When I finally arrive at the company\u2019s stylish studio in East London, I am greeted by Tosin Oshinyemi, the company\u2019s production lead. He is going to guide and direct me through the data collection process\u2014and by \u201cdata collection,\u201d I mean the capture of my facial features, mannerisms, and more\u2014much like he normally does for actors and Synthesia\u2019s customers.<\/p>\n<div style=\"width: 790px;\" class=\"wp-video\"><!--[if lt IE 9]><script>document.createElement('video');<\/script><![endif]-->\n<video class=\"wp-video-shortcode\" id=\"video-14693-1\" width=\"790\" height=\"444\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-2.mp4?_=1\" \/><a href=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-2.mp4\">https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-2.mp4<\/a><\/video><\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_2\">\n<p>He introduces me to a waiting stylist and a makeup artist, and I curse myself for wasting so much time getting ready. Their job is to ensure that people have the kind of clothes that look good on camera and that they look consistent from one shot to the next. The stylist tells me my outfit is\u00a0<em>fine<\/em>\u00a0(phew), and the makeup artist touches up my face and tidies my baby hairs. The dressing room is decorated with hundreds of smiling Polaroids of people who have been digitally cloned before me.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_4\">\n<p>Apart from the small supercomputer whirring in the corridor, which processes the data generated at the studio, this feels more like going into a news studio than entering a deepfake factory.<\/p>\n<p>I joke that Oshinyemi has what\u00a0<em>MIT Technology Review<\/em>\u00a0might call a\u00a0<a href=\"https:\/\/www.technologyreview.com\/2024\/04\/24\/1091125\/ai-prompt-engineer-generative-ai-job-titles\/\" target=\"_blank\" rel=\"noopener\">job title of the future<\/a>: \u201cdeepfake creation director.\u201d<\/p>\n<p>\u201cWe like the term \u2018synthetic media\u2019 as opposed to \u2018deepfake,\u2019\u201d he says.<\/p>\n<p>It\u2019s a subtle but, some would argue, notable difference in semantics. Both mean AI-generated videos or audio recordings of people doing or saying something that didn\u2019t necessarily happen in real life. But deepfakes have a bad reputation. Since their inception nearly a decade ago, the term has come to signal something unethical, says Alexandru Voica, Synthesia\u2019s head of corporate affairs and policy. Think of\u00a0<a href=\"https:\/\/www.technologyreview.com\/2024\/01\/29\/1087325\/three-ways-we-can-fight-deepfake-porn-taylors-version\/\" target=\"_blank\" rel=\"noopener\">sexual content produced without consent<\/a>, or political campaigns that spread disinformation or propaganda.<\/p>\n<p>\u201cSynthetic media is the more benign, productive version of that,\u201d he argues. And Synthesia wants to offer the best version of that version.<\/p>\n<p>Until now, all AI-generated videos of people have tended to have some stiffness, glitchiness, or other unnatural elements that make them pretty easy to differentiate from reality. Because they\u2019re so close to the real thing but\u00a0<em>not quite<\/em>\u00a0<em>it<\/em>, these videos can make people feel annoyed or uneasy or icky\u2014a phenomenon commonly known as the uncanny valley. Synthesia claims its new technology will finally lead us out of the valley.<\/p>\n<p>Thanks to rapid advancements in generative AI and a glut of training data created by human actors that has been fed into its AI model, Synthesia has been able to produce avatars that are indeed more humanlike and more expressive than their predecessors. The digital clones are better able to match their reactions and intonation to the sentiment of their scripts\u2014acting more upbeat when talking about happy things, for instance, and more serious or sad when talking about unpleasant things. They also do a better job matching facial expressions\u2014the tiny movements that can speak for us without words.<\/p>\n<p>But this technological progress also signals a much larger social and cultural shift. Increasingly, so much of what we see on our screens is generated (or at least tinkered with) by AI, and it is becoming more and more difficult to distinguish what is real from what is not. This threatens our trust in everything we see, which could have very real, very dangerous consequences.<\/p>\n<p>\u201cI think we might just have to say goodbye to finding out about the truth in a quick way,\u201d says Sandra Wachter, a professor at the Oxford Internet Institute, who researches the legal and ethical implications of AI. \u201cThe idea that you can just quickly Google something and know what\u2019s fact and what\u2019s fiction\u2014I don\u2019t think it works like that anymore.\u201d<\/p>\n<figure id=\"attachment_14695\" aria-describedby=\"caption-attachment-14695\" style=\"width: 2560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14695\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-scaled.webp\" alt=\"\" width=\"2560\" height=\"1977\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-scaled.webp 2560w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-300x232.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-1024x791.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-768x593.webp 768w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-1536x1186.webp 1536w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0749-2048x1582.webp 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/><figcaption id=\"caption-attachment-14695\" class=\"wp-caption-text\">Tosin Oshinyemi, the company\u2019s production lead, guides and directs actors and customers through the data collection process. DAVID VINTINER<\/figcaption><\/figure>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_7\">\n<p>So while I was excited for Synthesia to make my digital double, I also wondered if the distinction between synthetic media and deepfakes is fundamentally meaningless. Even if the former centers a creator\u2019s intent and, critically, a subject\u2019s consent, is there really a way to make AI avatars safely if the end result is the same? And do we really want to get out of the uncanny valley if it means we can no longer grasp the truth?<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_9\">\n<p>But more urgently, it was time to find out what it\u2019s like to see a post-truth version of yourself.<\/p>\n<p class=\"wp-block-heading\"><strong>Almost the real thing<\/strong><\/p>\n<p>A month before my trip to the studio, I visited Synthesia CEO Victor Riparbelli at his office near Oxford Circus. As Riparbelli tells it, Synthesia\u2019s origin story stems from his experiences exploring avant-garde, geeky techno music while growing up in Denmark. The internet allowed him to download software and produce his own songs without buying expensive synthesizers.<\/p>\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group__inner-container\">\n<p>\u201cI\u2019m a huge believer in giving people the ability to express themselves in the way that they can, because I think that that provides for a more meritocratic world,\u201d he tells me.<\/p>\n<p>He saw the possibility of doing something similar with video when he came across\u00a0<a href=\"https:\/\/niessnerlab.org\/projects\/thies2016face.html\" target=\"_blank\" rel=\"noopener\">research<\/a>\u00a0on using deep learning to transfer expressions from one human face to another on screen.<\/p>\n<p>\u201cWhat that showcased was the first time a deep-learning network could produce video frames that looked and felt real,\u201d he says.<\/p>\n<p>That research was conducted by Matthias Niessner, a professor at the Technical University of Munich, who cofounded Synthesia with Riparbelli in 2017, alongside University College London professor Lourdes Agapito and Steffen Tjerrild, whom Riparbelli had previously worked with on a cryptocurrency project.<\/p>\n<p>Initially the company built lip-synching and dubbing tools for the entertainment industry, but it found that the bar for this technology\u2019s quality was very high and there wasn\u2019t much demand for it. Synthesia changed direction in 2020 and launched its first generation of AI avatars for corporate clients. That pivot paid off. In 2023, Synthesia achieved unicorn status, meaning it was valued at over $1 billion\u2014making it one of the relatively few European AI companies to do so.<\/p>\n<p>That first generation of avatars looked clunky, with looped movements and little variation. Subsequent iterations started looking more human, but they still struggled to say complicated words, and things were slightly out of sync.<\/p>\n<p>The challenge is that people are used to looking at other people\u2019s faces. \u201cWe as humans know what real humans do,\u201d says Jonathan Starck, Synthesia\u2019s CTO. Since infancy, \u201cyou\u2019re really tuned in to people and faces. You know what\u2019s right, so anything that\u2019s not quite right really jumps out a mile.\u201d<\/p>\n<p>These earlier AI-generated videos, like deepfakes more broadly, were made using generative adversarial networks, or\u00a0<a href=\"https:\/\/www.technologyreview.com\/2021\/10\/12\/1036844\/ai-gan-fake-faces-data-privacy-security-leak\/\" target=\"_blank\" rel=\"noopener\">GANs<\/a>\u2014an older technique for generating images and videos that uses two neural networks that play off one another. It was a laborious and complicated process, and the technology was unstable.<\/p>\n<p>But in the generative AI boom of the last year or so, the company has found it can create much better avatars using generative neural networks that produce higher quality more consistently. The more data these models are fed, the better they learn. Synthesia uses both large language models and diffusion models to do this; the former help the avatars react to the script, and the latter generate the pixels.<\/p>\n<div id=\"piano__post_body-mobile-3\" class=\"piano__post_body\"><\/div>\n<p>Despite the leap in quality, the company is still not pitching itself to the entertainment industry. Synthesia continues to see itself as a platform for businesses. Its bet is this: As people spend more time watching videos on YouTube and TikTok, there will be more demand for video content. Young people are already skipping traditional search and defaulting to TikTok for information presented in video form. Riparbelli argues that Synthesia\u2019s tech could help companies convert their boring corporate comms and reports and training materials into content people will actually watch and engage with. He also suggests it could be used to make marketing materials.<\/p>\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group__inner-container\">\n<p>He claims Synthesia\u2019s technology is used by 56% of the Fortune 100, with the vast majority of those companies using it for internal communication. The company lists Zoom, Xerox, Microsoft, and Reuters as clients. Services start at $22 a month.<\/p>\n<p>This, the company hopes, will be a cheaper and more efficient alternative to video from a professional production company\u2014and one that may be nearly indistinguishable from it. Riparbelli tells me its newest avatars could easily fool a person into thinking they are real.<\/p>\n<p>\u201cI think we\u2019re 98% there,\u201d he says.<\/p>\n<p>For better or worse, I am about to see it for myself.<\/p>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_15\">\n<p class=\"wp-block-heading\"><strong>Don\u2019t be garbage<\/strong><\/p>\n<p>In AI research, there is a saying: Garbage in, garbage out. If the data that went into training an AI model is trash, that will be reflected in the outputs of the model. The more data points the AI model has captured of my facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_17\">\n<p>Back in the studio, I\u2019m trying really hard not to be garbage.<\/p>\n<p>I am standing in front of a green screen, and Oshinyemi guides me through the initial calibration process, where I have to move my head and then eyes in a circular motion. Apparently, this will allow the system to understand my natural colors and facial features. I am then asked to say the sentence \u201cAll the boys ate a fish,\u201d which will capture all the mouth movements needed to form vowels and consonants. We also film footage of me \u201cidling\u201d in silence.<\/p>\n<figure id=\"attachment_14696\" aria-describedby=\"caption-attachment-14696\" style=\"width: 1795px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14696\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen.webp\" alt=\"\" width=\"1795\" height=\"1688\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen.webp 1795w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen-300x282.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen-1024x963.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen-768x722.webp 768w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-greenscreen-1536x1444.webp 1536w\" sizes=\"(max-width: 1795px) 100vw, 1795px\" \/><figcaption id=\"caption-attachment-14696\" class=\"wp-caption-text\">The more data points the AI system has on facial movements, microexpressions, head tilts, blinks, shrugs, and hand waves, the more realistic the avatar will be. DAVID VINTINER<\/figcaption><\/figure>\n<p>He then asks me to read a script for a fictitious YouTuber in different tones, directing me on the spectrum of emotions I should convey. First I\u2019m supposed to read it in a neutral, informative way, then in an encouraging way, an annoyed and complain-y way, and finally an excited, convincing way.<\/p>\n<p>\u201cHey, everyone\u2014welcome back to\u00a0<em>Elevate Her\u00a0<\/em>with your host, Jess Mars. It\u2019s great to have you here. We\u2019re about to take on a topic that\u2019s pretty delicate and honestly hits close to home\u2014dealing with criticism in our spiritual journey,\u201d I read off the teleprompter, simultaneously trying to visualize ranting about something to my partner during the complain-y version. \u201cNo matter where you look, it feels like there\u2019s always a critical voice ready to chime in, doesn\u2019t it?\u201d<\/p>\n<p>\u201cThat was really good. I was watching it and I was like, \u2018Well, this is true. She\u2019s definitely complaining,\u2019\u201d Oshinyemi says, encouragingly. Next time, maybe add some judgment, he suggests.<\/p>\n<p>We film several takes featuring different variations of the script. In some versions I\u2019m allowed to move my hands around. In others, Oshinyemi asks me to hold a metal pin between my fingers as I do. This is to test the \u201cedges\u201d of the technology\u2019s capabilities when it comes to communicating with hands, Oshinyemi says.<\/p>\n<p>Historically, making AI avatars look natural and matching mouth movements to speech has been a very difficult challenge, says David Barber, a professor of machine learning at University College London who is not involved in Synthesia\u2019s work. That is because the problem goes far beyond mouth movements; you have to think about eyebrows, all the muscles in the face, shoulder shrugs, and the numerous different small movements that humans use to express themselves.<\/p>\n<figure id=\"attachment_14697\" aria-describedby=\"caption-attachment-14697\" style=\"width: 2560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14697\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-scaled.webp\" alt=\"\" width=\"2560\" height=\"1597\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-scaled.webp 2560w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-300x187.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-1024x639.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-768x479.webp 768w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-1536x958.webp 1536w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0820-2048x1277.webp 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/><figcaption id=\"caption-attachment-14697\" class=\"wp-caption-text\">The motion capture process uses reference patterns to help align footage captured from multiple angles around the subject. DAVID VINTINER<\/figcaption><\/figure>\n<p>Synthesia has worked with actors to train its models since 2020, and their doubles make up the 225 stock avatars that are available for customers to animate with their own scripts. But to train its latest generation of avatars, Synthesia needed more data; it has spent the past year working with around 1,000 professional actors in London and New York. (Synthesia says it does not sell the data it collects, although it does release some of it for\u00a0<a href=\"https:\/\/www.actors-hq.com\/\" target=\"_blank\" rel=\"noopener\">academic research purposes<\/a>.)<\/p>\n<p>The actors previously got paid each time their avatar was used, but now the company pays them an up-front fee to train the AI model. Synthesia uses their avatars for three years, at which point actors are asked if they want to renew their contracts. If so, they come into the studio to make a new avatar. If not, the company will delete their data. Synthesia\u2019s enterprise customers can also generate their own custom avatars by sending someone into the studio to do much of what I\u2019m doing.<\/p>\n<figure id=\"attachment_14698\" aria-describedby=\"caption-attachment-14698\" style=\"width: 2560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14698\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-scaled.webp\" alt=\"\" width=\"2560\" height=\"1977\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-scaled.webp 2560w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-300x232.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-1024x791.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-768x593.webp 768w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-1536x1186.webp 1536w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0695-2048x1582.webp 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/><figcaption id=\"caption-attachment-14698\" class=\"wp-caption-text\">The initial calibration process allows the system to understand the subject&#8217;s natural colors and facial features. DAVID VINTINER<\/figcaption><\/figure>\n<figure id=\"attachment_14699\" aria-describedby=\"caption-attachment-14699\" style=\"width: 2560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14699\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-scaled.webp\" alt=\"\" width=\"2560\" height=\"1977\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-scaled.webp 2560w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-300x232.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-1024x791.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-768x593.webp 768w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-1536x1186.webp 1536w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/David_Vintiner__A7A0902-2048x1582.webp 2048w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/><figcaption id=\"caption-attachment-14699\" class=\"wp-caption-text\">Synthesia also collects voice samples. In the studio, I read a passage indicating that I explicitly consent to having my voice cloned. DAVID VINTINER<\/figcaption><\/figure>\n<p>Between takes, the makeup artist comes in and does some touch-ups to make sure I look the same in every shot. I can feel myself blushing because of the lights in the studio, but also because of the acting. After the team has collected all the shots it needs to capture my facial expressions, I go downstairs to read more text aloud for voice samples.<\/p>\n<p>This process requires me to read a passage indicating that I explicitly consent to having my voice cloned, and that it can be used on Voica\u2019s account on the Synthesia platform to generate videos and speech.<\/p>\n<p class=\"wp-block-heading\"><strong>Consent is key<\/strong><\/p>\n<p>This process is very different from the way many AI avatars, deepfakes, or synthetic media\u2014whatever you want to call them\u2014are created.<\/p>\n<p>Most deepfakes aren\u2019t created in a studio.\u00a0<a href=\"https:\/\/www.technologyreview.com\/2021\/02\/12\/1018222\/deepfake-revenge-porn-coming-ban\/\" target=\"_blank\" rel=\"noopener\">Studies<\/a>\u00a0have shown that the vast majority of deepfakes online are nonconsensual sexual content, usually using images stolen from social media. Generative AI has made the creation of these deepfakes easy and cheap, and there have been several\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/12\/04\/1084271\/meet-the-15-year-old-deepfake-porn-victim-pushing-congress\/\" target=\"_blank\" rel=\"noopener\">high-profile cases in the US<\/a>\u00a0and Europe of children and women being abused in this way. Experts have also raised alarms that the technology can be used to spread political disinformation, a particularly acute threat given the\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/12\/15\/1085441\/eric-schmidt-plan-for-fighting-election-misinformation\/\" target=\"_blank\" rel=\"noopener\">record number of elections<\/a> happening around the world this year.<\/p>\n<p>Synthesia\u2019s policy is to not create avatars of people without their explicit consent. But it hasn\u2019t been immune from abuse. Last year,\u00a0<a href=\"https:\/\/www.nytimes.com\/2023\/02\/07\/technology\/artificial-intelligence-training-deepfake.html\" target=\"_blank\" rel=\"noopener\">researchers found<\/a>\u00a0pro-China misinformation that was created using Synthesia\u2019s avatars and packaged as news, which the company said violated its terms of service.<\/p>\n<p>Since then, the company has put more rigorous verification and content moderation systems in place. It applies a\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/07\/28\/1076843\/cryptography-ai-labeling-problem-c2pa-provenance\/\" target=\"_blank\" rel=\"noopener\">watermark<\/a>\u00a0with information on where and how the AI avatar videos were created. Where it once had four in-house content moderators, people doing this work now make up 10% of its 300-person staff. It also hired an engineer to build better AI-powered content moderation systems. These filters help Synthesia vet every single thing its customers try to generate. Anything suspicious or ambiguous, such as content about cryptocurrencies or sexual health, gets forwarded to the human content moderators. Synthesia also keeps a record of all the videos its system creates.<\/p>\n<p>And while anyone can join the platform, many features aren\u2019t available until people go through an extensive vetting system similar to that used by the banking industry, which includes talking to the sales team, signing legal contracts, and submitting to security auditing, says Voica. Entry-level customers are limited to producing strictly factual content, and only enterprise customers using custom avatars can generate content that contains opinions. On top of this, only accredited news organizations are allowed to create content on current affairs.<\/p>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_23\">\n<p>\u201cWe can\u2019t claim to be perfect. If people report things to us, we take quick action, [such as] banning or limiting individuals or organizations,\u201d Voica says. But he believes these measures work as a deterrent, which means most bad actors will turn to freely available open-source tools instead.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_25\">\n<p>I put some of these limits to the test when I head to Synthesia\u2019s office for the next step in my avatar generation process. In order to create the videos that will feature my avatar, I have to write a script. Using Voica\u2019s account, I decide to use passages from\u00a0<em>Hamlet,\u00a0<\/em>as well as previous articles I have written. I also use a new feature on the Synthesia platform, which is an AI assistant that transforms any web link or document into a ready-made script. I try to get my avatar to read news about the European Union\u2019s new sanctions against Iran.<\/p>\n<p>Voica immediately texts me: \u201cYou got me in trouble!\u201d<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_27\">\n<p>The system has flagged his account for trying to generate content that is restricted.<\/p>\n<figure id=\"attachment_14700\" aria-describedby=\"caption-attachment-14700\" style=\"width: 1280px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" class=\"size-full wp-image-14700\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/WhatsApp-Image-2024-04-19-at-10.29.54.webp\" alt=\"\" width=\"1280\" height=\"717\" srcset=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/WhatsApp-Image-2024-04-19-at-10.29.54.webp 1280w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/WhatsApp-Image-2024-04-19-at-10.29.54-300x168.webp 300w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/WhatsApp-Image-2024-04-19-at-10.29.54-1024x574.webp 1024w, https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/WhatsApp-Image-2024-04-19-at-10.29.54-768x430.webp 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><figcaption id=\"caption-attachment-14700\" class=\"wp-caption-text\">AI-powered content filters help Synthesia vet every single thing its customers try to generate. Only accredited news organizations are allowed to create content on current affairs. COURTESY OF SYNTHESIA<\/figcaption><\/figure>\n<p>Offering services without these restrictions would be \u201ca great growth strategy,\u201d Riparbelli grumbles. But \u201cultimately, we have very strict rules on what you can create and what you cannot create. We think the right way to roll out these technologies in society is to be a little bit over-restrictive at the beginning.\u201d<\/p>\n<p>Still, even if these guardrails operated perfectly, the ultimate result would nevertheless be an internet where everything is fake. And my experiment makes me wonder how we could possibly prepare ourselves.<\/p>\n<p>Our information landscape already feels very murky. On the one hand, there is heightened public awareness that AI-generated content is flourishing and could be a powerful tool for misinformation. But on the other, it is still unclear whether deepfakes are used for misinformation at scale and whether they\u2019re\u00a0<a href=\"https:\/\/misinforeview.hks.harvard.edu\/article\/misinformation-reloaded-fears-about-the-impact-of-generative-ai-on-misinformation-are-overblown\/\" target=\"_blank\" rel=\"noopener\">broadly moving the needle<\/a>\u00a0to change people\u2019s beliefs and behaviors.<\/p>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_29\">\n<p>If people become too skeptical about the content they see, they might stop believing in anything at all, which could enable bad actors to take advantage of this trust vacuum and lie about the authenticity of real content. Researchers have called this the \u201c<a href=\"https:\/\/www.brennancenter.org\/our-work\/research-reports\/deepfakes-elections-and-shrinking-liars-dividend\" target=\"_blank\" rel=\"noopener\">liar\u2019s dividend<\/a>.\u201d They warn that politicians, for example, could claim that genuinely incriminating information was fake or created using AI.<\/p>\n<p>Claire Leibowicz, the head of the AI and media integrity at the nonprofit Partnership on AI, says she worries that growing awareness of this gap will make it easier to \u201cplausibly deny and cast doubt on real material or media as evidence in many different contexts, not only in the news, [but] also in the courts, in the financial services industry, and in many of our institutions.\u201d She tells me she\u2019s heartened by the resources Synthesia has devoted to content moderation and consent but says that process is never flawless.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_31\">\n<p>Even Riparbelli admits that in the short term, the proliferation of AI-generated content will probably cause trouble. While people have been trained not to believe everything they read, they still tend to trust images and videos, he adds. He says people now need to test AI products for themselves to see what is possible, and should not trust anything they see online unless they have verified it.<\/p>\n<p>Never mind that AI regulation is still patchy, and the tech sector\u2019s efforts to verify\u00a0<a href=\"https:\/\/www.technologyreview.com\/2023\/07\/28\/1076843\/cryptography-ai-labeling-problem-c2pa-provenance\/\" target=\"_blank\" rel=\"noopener\">content provenance<\/a>\u00a0are still in their early stages. Can consumers, with their varying degrees of media literacy, really fight the growing wave of harmful AI-generated content through individual action?<\/p>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_31\">\n<p class=\"wp-block-heading\"><strong>Watch out, PowerPoint<\/strong><\/p>\n<p>The day after my final visit, Voica emails me the videos with my avatar. When the first one starts playing, I am taken aback. It\u2019s as painful as seeing yourself on camera or hearing a recording of your voice. Then I catch myself. At first I thought the avatar\u00a0<em>was<\/em>\u00a0me.<\/p>\n<p>The more I watch videos of \u201cmyself,\u201d the more I spiral. Do I really squint that much? Blink that much? And move my jaw like that?\u00a0<em>Jesus.\u00a0<\/em><\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_33\">\n<p>It\u2019s good. It\u2019s really good. But it\u2019s not perfect. \u201cWeirdly good animation,\u201d my partner texts me.<\/p>\n<p>\u201cBut the voice sometimes sounds exactly like you, and at other times like a generic American and with a weird tone,\u201d he adds. \u201cWeird AF.\u201d<\/p>\n<p>He\u2019s right. The voice is sometimes me, but in real life I\u00a0<em>umm<\/em>\u00a0and\u00a0<em>ahh<\/em>\u00a0more. What\u2019s remarkable is that it picked up on an irregularity in the way I talk. My accent is a transatlantic mess, confused by years spent living in the UK, watching American TV, and attending international school. My avatar sometimes says the word \u201crobot\u201d in a British accent and other times in an American accent. It\u2019s something that probably nobody else would notice. But the AI did.<\/p>\n<p>My avatar\u2019s range of emotions is also limited. It delivers Shakespeare\u2019s \u201cTo be or not to be\u201d speech very matter-of-factly. I had guided it to be furious when reading a story I wrote about\u00a0<a href=\"https:\/\/www.technologyreview.com\/2024\/01\/29\/1087376\/dear-taylor-swift-were-sorry-about-those-explicit-deepfakes\/\" target=\"_blank\" rel=\"noopener\">Taylor Swift\u2019s nonconsensual nude deepfakes<\/a>; the avatar is complain-y and judgy, for sure, but not angry.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_35\">\n<p>This isn\u2019t the first time I\u2019ve made myself a test\u00a0<a href=\"https:\/\/www.technologyreview.com\/2022\/08\/31\/1058800\/what-does-gpt-3-know-about-me\/\" target=\"_blank\" rel=\"noopener\">subject<\/a>\u00a0for new AI. Not too long ago, I tried generating AI avatar images of myself, only to get a\u00a0<a href=\"https:\/\/www.technologyreview.com\/2022\/12\/12\/1064751\/the-viral-ai-avatar-app-lensa-undressed-me-without-my-consent\/\" target=\"_blank\" rel=\"noopener\">bunch of nudes<\/a>. That experience was a jarring example of just how biased AI systems can be. But this experience\u2014and this particular way of being immortalized\u2014was definitely on a different level.<\/p>\n<p>Carl \u00d6hman, an assistant professor at Uppsala University who has studied digital remains and is the author of a new book,\u00a0<em>The Afterlife of Data<\/em>, calls avatars like the ones I made \u201cdigital corpses.\u201d<\/p>\n<p>\u201cIt looks exactly like you, but no one\u2019s home,\u201d he says. \u201cIt would be the equivalent of cloning you, but your clone is dead. And then you\u2019re animating the corpse, so that it moves and talks, with electrical impulses.\u201d<\/p>\n<p>That\u2019s kind of how it feels. The little, nuanced ways I don\u2019t recognize myself are enough to put me off. Then again, the avatar could quite possibly fool anyone who doesn\u2019t know me very well. It really shines when presenting a story I wrote about how the field of\u00a0<a href=\"https:\/\/www.technologyreview.com\/2024\/04\/11\/1090718\/household-robots-ai-data-robotics\/\" target=\"_blank\" rel=\"noopener\">robotics could be getting its own ChatGPT moment<\/a>; the virtual AI assistant summarizes the long read into a decent short video, which my avatar narrates. It is not Shakespeare, but it\u2019s better than many of the corporate presentations I\u2019ve had to sit through. I think if I were using this to deliver an end-of-year report to my colleagues, maybe that level of authenticity would be enough.<\/p>\n<p>And\u00a0<em>that<\/em>\u00a0is the sell, according to Riparbelli: \u201cWhat we\u2019re doing is more like PowerPoint than it is like Hollywood.\u201d<\/p>\n<div style=\"width: 790px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-14693-2\" width=\"790\" height=\"444\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-6.mp4?_=2\" \/><a href=\"https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-6.mp4\">https:\/\/www.fie.undef.edu.ar\/ceptm\/wp-content\/uploads\/2024\/05\/Melissa-test-6.mp4<\/a><\/video><\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_35\">\n<p>The newest generation of avatars certainly aren\u2019t ready for the silver screen. They\u2019re still stuck in portrait mode, only showing the avatar front-facing and from the waist up. But in the not-too-distant future, Riparbelli says, the company hopes to create avatars that can communicate with their hands\u00a0and\u00a0have conversations with one another.\u00a0It is also planning for\u00a0full-body avatars that can walk and move around in a space that a person has generated.\u00a0(The rig to enable this technology already exists; in fact it&#8217;s where I am in the image at the top of this piece.)<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_37\">\n<p>But do we\u00a0<em>really<\/em>\u00a0want that? It feels like a bleak future where humans are consuming AI-generated content presented to them by AI-generated avatars and using AI to repackage that into more content, which will likely be scraped to generate more AI. If nothing else, this experiment made clear to me that the technology sector urgently needs to step up its content moderation practices and ensure that content provenance techniques such as\u00a0<a href=\"https:\/\/www.technologyreview.com\/2024\/03\/29\/1090310\/its-easy-to-tamper-with-watermarks-from-ai-generated-text\/#:~:text=Watermarks%20for%20AI%2Dgenerated%20text,trusting%20text%20they%20shouldn't.\" target=\"_blank\" rel=\"noopener\">watermarking<\/a> are robust.<\/p>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_37\">\n<p>Even if Synthesia\u2019s technology and content moderation aren\u2019t yet perfect, they\u2019re significantly better than anything I have seen in the field before, and this is after only a year or so of the current boom in generative AI. AI development moves at breakneck speed, and it is both exciting and daunting to consider what AI avatars will look like in just a few years. Maybe in the future we will have to adopt safewords to indicate that you are in fact communicating with a real human, not an AI.<\/p>\n<\/div>\n<\/div>\n<div>\n<div class=\"gutenbergContent__content--109b03a769a11e8ae3acbab352a64269 html_39\">\n<p>But that day is not today.<\/p>\n<p>I found it weirdly comforting that in one of the videos, my avatar rants about nonconsensual deepfakes and says, in a sociopathically happy voice, \u201cThe tech giants? Oh! They\u2019re making a killing!\u201d<\/p>\n<p>I would never.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><strong>Fuente:<\/strong> <a href=\"https:\/\/www.technologyreview.com\/2024\/04\/25\/1091772\/new-generative-ai-avatar-deepfake-synthesia\/\" target=\"_blank\" rel=\"noopener\"><em>https:\/\/www.technologyreview.com<\/em><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>La nueva tecnolog\u00eda de Synthesia es impresionante, pero plantea grandes preguntas sobre un mundo en el que cada vez m\u00e1s no podemos saber qu\u00e9 es&hellip; <\/p>\n","protected":false},"author":1,"featured_media":14702,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[23],"tags":[],"_links":{"self":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/14693"}],"collection":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14693"}],"version-history":[{"count":1,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/14693\/revisions"}],"predecessor-version":[{"id":14703,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/posts\/14693\/revisions\/14703"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=\/wp\/v2\/media\/14702"}],"wp:attachment":[{"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14693"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14693"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fie.undef.edu.ar\/ceptm\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14693"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}