



A popular artificial intelligence (AI)-powered image generator can run up to 30 times faster thanks to technology that condenses the entire 100-step process into a single step, new research has revealed.

Scientists have devised a technique called “distribution matching distillation” (DMD) to teach new AI models to mimic established image generators known as diffusion models such as DALL・E 3, Midjourney, and Stable Diffusion. did.

This framework enables smaller, leaner AI models that can generate images faster while maintaining final image quality. The scientists detailed their findings in a study uploaded to the preprint server arXiv on December 5, 2023.

“Our work is a new way to accelerate current diffusion models such as stable diffusion and DALLE-3 by a factor of 30,” said study co-lead author Tianwei, a doctoral student in electrical engineering and computer science at MIT. ying said in a statement. “This advancement not only significantly reduces computation time, but also maintains, if not exceeds, the quality of the visual content produced.

Diffusion models generate images through a multistep process. By using images with descriptive text captions and other metadata as training data, the AI ​​is trained to better understand the context and meaning behind the images, allowing it to respond accurately to text prompts. It will be.

In reality, these models work by taking a random image and encoding it with a field of random noise to destroy it, AI scientist Jay Alamar explained in a blog post. This is called “forward diffusion” and is an important step in the training process. The image is then denoised through up to 100 steps. This is known as “de-spreading” and produces a clear image based on a text prompt.

By applying the new framework to the new model and reducing these “de-diffusion” steps to one, the scientists reduced the average time it takes to generate an image. In one test, their model reduced image generation time from about 2,590 milliseconds (or 2.59 seconds) to 90 milliseconds using Stable Diffusion v1.5. This is 28.8 times faster.

DMD has two components that work together to reduce the number of iterations a model requires before spitting out a usable image. The first is called “regression loss,” which organizes images based on similarity during training to speed up the AI's learning. The second is called “distribution matching loss,” which means that, for example, the probability of being depicted picking a bite out of an apple matches the probability of encountering an apple in the real world. Together, these techniques minimize the strange appearance of images generated by the new AI model.

“Reducing the number of iterations has been the holy grail since the inception of pervasive models,” co-lead author Fredo Durand, professor of electrical engineering and computer science at the Massachusetts Institute of Technology, said in a statement. “We are very excited to finally be able to perform single-step image generation. This significantly reduces computing costs and accelerates the process.”

While the original diffusion model required “hundreds of iterative refinement steps,” the new approach requires only one step of computational power to generate the images, a significant reduction, Yin said. The model also has benefits for industries where ultra-fast and efficient generation is important, scientists say, and could lead to faster content creation.

