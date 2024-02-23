



Remember when AI art generators became widely available in 2022 and suddenly the internet was full of creepy photos that were super cool but didn't look right when you looked closely? Same thing again Be prepared for that to happen. This time it's a video.

Last week, OpenAI released Sora, a generative AI model that generates videos based on simple prompts. It's not yet available to the public, but CEO Sam Altman showed off the feature by accepting requests on X (former Twitter name). Users answered short prompts, such as monkeys playing chess in the park or bicycle races at sea with various animals as players. Its eerie, enchanting, strange, and beautiful, prompting the usual cycle of exposition.

While some have argued strongly about Sora's negative effects, predicting a wave of disinformation, I (and experts) believe that future powerful AI systems pose very serious risks, but certain The claim that the model will lead to a wave of disinformation is not supported. far.

Others have pointed out that Sora's many flaws represent a fundamental limitation of the technology, and that it was the mistake people made when they did it with image generation models, and that I believe it I'm wondering if I'll be wrong again. As my colleague AW Ohlheiser pointed out, just as DALL-E and ChatGPT have improved over time, Sora is likely to improve as well.

Both bullish and bearish predictions may still come true, but if people from all walks of life take deeper into consideration all the ways we've been proven wrong in recent years, Sora and Generative AI Conversations around this will become more productive.

What DALL-E 2 and Midjourney can teach us about Sora

Two years ago, OpenAI introduced DALL-E 2, a model that can generate still images from text prompts. The high-resolution, fantastical images created by this work quickly spread on social media, with people asking, “Is this real art?” Fake art? A threat to artists? A tool for artists? A disinformation machine? Two years later, it's worth a little retrospective if we want our view of Sora to age better.

The release of DALL-E 2 was just a few months ahead of two popular competitors, Midjourney and Stable Diffusion. They each had their strengths and weaknesses. DALL-E 2 created a more photorealistic image and followed the prompts a little more closely. Midjourney was more artistic. Together, they have made AI art available to millions of people at the click of a button.

At the time, much of the social impact of generative AI did not come directly from DALL-E 2, but from the wave of imaging models it led. Similarly, one might expect that the important question about Sora is not just what Sora can do, but what Sora's imitators and competitors can do.

Many believed that DALL-E and its competitors were heralding a flood of deepfake propaganda and fraud that threatened our democracy. Such an effect may someday emerge, but those calls appear to have been premature. Analyst Peter Carlyon said in December that the impact of deepfakes on democracy seems always on the horizon, and that most propaganda remains of the boring variety, for example taken out of context. It said people were pulling statements or sharing images of certain conflicts and mislabeling them. as coming from someone else.

Perhaps at some point this will change, but there should be some humility in claiming that Sora will be that change. We don't need deepfakes to lie to people, but deepfakes are still an expensive method. (The AI ​​generation is relatively cheap, but if you want something tangible and convincing, it's much more expensive.) The deepfake tsunami means a scale that most spammers can't afford at the moment. )

But it's most important for me to remember the past two years of AI history when I read criticism that Sora's images were clumsy, stiff, inhuman, or clearly flawed. . It's true, they are. OpenAI's research release acknowledges that Sora does not accurately model the physics of many fundamental interactions, adding that it has problems with cause and effect, left-right confusion, and trajectory tracking. .

Of course, much the same criticism was leveled at DALL-E 2 and Midjourney, at least initially. Early coverage of DALL-E 2 highlighted its incompetence, such as creating terrifying monsters every time a scene required more than one character, and giving people claws instead of hands. AI experts argued that the AI's inability to process configurations and instructions about how to organize the elements of a scene reflects a fundamental flaw in the technology.

But in reality, models have gotten better at executing very specific prompts, and users have gotten better at executing prompts as well, so that today's images with complex and detailed scenes It is now possible to create. Almost all of the interesting flaws were fixed in the latest updates for DALL-E 3 and Midjourney released last year. Current image generators handle scenes of hands and crowds well.

In the period between DALL-E 2 and Sora, AI image generation has grown from a party trick to a large-scale industry. Many things you can't do with DALL-E 2, you can do with DALL-E 3. And even if DALL-E 3 can't do it, its competitors often can. This is an important perspective to keep in mind when reading predictions about Sora. You're probably looking at the early stages to a major new feature, and that feature could be used for good or malicious purposes, and you might oversell it, but it's a very It's also easy to do. To sell it short.

Rather than getting too committed to a particular perspective on what Sora and its successors can or cannot do, it's worth acknowledging some uncertainty about the direction ahead. It's much easier to say, “This technology will continue to advance by leaps and bounds,” than to speculate on exactly what will happen.

