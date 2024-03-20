



While the tech industry is all about generative artificial intelligence, one giant is taking a step back. That's Apple. The company has yet to introduce anything more than AI-generated emoji, and the company is in preliminary talks with Google about adding the search company's Gemini AI model to iPhones, according to a New York Times report today and an earlier report from Bloomberg. It is said that they are conducting

But a research paper quietly posted online by Apple engineers last Friday suggests that the company's significant new investments in AI are already bearing fruit. Learn more about the development of a new generative AI model called MM1 that can manipulate text and images. Researchers have shown that it can answer questions about photos and display general knowledge skills like those displayed by chatbots like ChatGPT. The model name is not explained, but it may represent MultiModal 1.

MM1 appears to be similar in design and sophistication to various recent AI models from other big tech companies, including Metas' open source Llama 2 and Google's Gemini. Research by Apple's rivals and academics is using this type of model to power powerful chatbots, or to write code to perform actions such as using computer interfaces or websites. It has been shown that it is possible to build agents that can solve tasks. This suggests that MM1 may still make its way to Apple products.

The fact that they're doing this shows they have the ability to understand how to train and build these models, said Carnegie, who led AI research at Apple several years ago. Mellon University professor Ruslan Sarakudinov said. A certain level of expertise is required.

MM1 is a multimodal large-scale language model (MLLM), meaning it is trained on images as well as text. This allows the model to respond to text prompts and even answer complex questions about specific images.

One example from Apple's research paper shows what happened when MM1 was provided with a photo of a sun-dappled restaurant table with two beer bottles and an image of the menu. If you ask how much you would expect to pay for all the beers on the table, the model will accurately read the correct prices and calculate the cost.

This is just the beginning. The team is already hard at work developing the next generation model.

Brandon McKinzie, Apple Researcher

When ChatGPT was released in November 2022, it was only capable of ingesting and producing text, but recently its creator, OpenAI, and others have extended the underlying large-scale language model technology to include other We've been working on being able to work with different types of data. When Google announced Gemini (the model that currently powers its answers to ChatGPT) last December, the company touted its multimodal nature as the beginning of an important new direction in AI. According to Apple's paper, after the rise of LLM, MLLM is emerging as the next frontier of basic models.

MM1 is a relatively small model measured by the number of parameters, that is, the number of internal variables that are adjusted when training the model. Kate Saenko, a professor at Boston University who specializes in computer vision and machine learning, says this allows Apple engineers to experiment with different training methods and improvements before scaling up when they see something promising. He says it might be easier to try.

Saenko said the MM1 paper contains surprisingly detailed information about how the model was trained for the company publication. For example, the engineers behind MM1 describe tricks to improve model performance, such as increasing image resolution and mixing text and image data. Apple is notoriously secretive, but it has been unusually open about its AI research in the past as it seeks to attract the talent it needs to compete on key technologies.

