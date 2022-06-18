



The DALL-E Mini software provided by a group of open source developers isn’t perfect, but it may be able to effectively create photos that match people’s textual descriptions.

As you scroll through modern social media feeds, you’re more likely to notice illustrations with captions. They are popular now.

The photos you are looking at are probably made possible by a text-to-image program called DALL-E. Before posting an illustration, people insert a word and convert it into an image with an artificial intelligence model.

For example, a Twitter user posted a tweet with the text “Should I live or die, Rabbi with avocado, Marble sculpture”. The attached photo is very elegant and shows a marble statue of a bearded man in a robe and bowler hat holding an avocado.

The AI ​​model comes from Google’s Imagen software and OpenAI, a Microsoft-backed startup that developed DALL-E 2. On its website, OpenAI calls DALL-E2 “a new AI system that can create realistic images and art.” Explanation in natural language. “

But most of what’s happening in this area comes from a relatively small number of people sharing photos, and in some cases, creating high engagement. This is because Google and OpenAI have not made the technology widely available to the public.

Many of OpenAI’s early users are employees’ friends and relatives. If you want access, you need to join the waiting list to indicate whether you are a professional artist, developer, academic researcher, journalist, or online creator.

“We are working hard to accelerate access, but it can take some time to reach everyone. As of June 15, we invited 10,217 people to DALL-E. “I did,” wrote Joanne Jang of OpenAI on the company’s help page. Website.

One of the publicly available systems is the DALL-E Mini. It leverages open source code from a loosely organized team of developers and is often overloaded with demand. When I try to use it, the dialog box “Too much traffic. Please try again” is displayed.

This is a bit reminiscent of Google’s Gmail service, which attracted people with unlimited email storage space in 2004. Early adopters were initially only available by invitation and kept millions of people waiting. Today, Gmail is one of the most popular email services in the world.

Creating images from text is not as ubiquitous as email. But technology does have moments, and part of its appeal lies in its monopoly.

At Midjourney, a private research lab, if you want to try out an image generation bot from the Discord chat app channel, you need to fill out a form. Only a group of selected people use Imagen and post photos from it.

The text-to-image service is sophisticated, identifying the most important parts of the user’s prompts and guessing the best way to explain those terms. Google has trained Imagen models using hundreds of in-house AI chips with 460 million internal image and text pairs in addition to external data.

The interface is simple. Usually there is a text box, a button to start the generation process, and an area to display the image. To indicate the source, Google and OpenAI add a watermark in the lower right corner of the DALL-E2 and Imagen images.

Companies and groups building software are, of course, concerned that everyone will hit the gate at once. Processing web requests to execute queries using these AI models can be costly. More importantly, the model is not perfect and does not always produce results that accurately represent the world.

Engineers trained their models on an extensive collection of words and photos from the web, including photos posted on Flickr.

San Francisco-based OpenAI recognizes the potential harm that can result from models that have learned how to create images by essentially scrutinizing the Web. To address the risk, employees have removed violent content from their training data. There is also a filter that prevents DALL-E2 from producing images if the user sends a prompt that may violate the company’s policies against nudity, violence, intrigue, or political content.

“There is an ongoing process to improve the safety of these systems,” said Prafulla Dhariwal, a research scientist at OpenAI.

Understanding the resulting bias is also important and represents widespread concern about AI. Texas developer Boris Dayma and others working on the DALL-E Mini elaborated on the issue in the software description.

“Occupations that show a higher level of education (engineers, doctors, scientists, etc.) or higher physical labor (construction, etc.) are primarily represented by whites,” they write. “In contrast, nurses, secretaries, or assistants are usually women, often white.”

Google explained a similar shortcoming of its Imagen model in an academic paper.

Despite the risks, OpenAI is excited about the kind of things technology can enable. Dhariwal said it offers creative opportunities for individuals and may be useful for commercial applications in interior design and website dress-up.

Results should continue to improve over time. Introduced in April, DALL-E2 spits out more realistic images than the initial version announced by OpenAI last year, and the company’s text generation model, GPT, has become more sophisticated with each generation.

“We can expect that to happen with many of these systems,” Dariwal said.

