The world is watching to see what Apple will do to counter Microsoft and Google's dominance in generative AI. Many assume the tech giant's innovation will take the form of neural nets on the iPhone and other Apple devices. There are little clues popping up here and there that hint at what Apple is working on.

Apple last week introduced OpenELM, an “embedded” large-scale language model (LLM) that runs on mobile devices, and is essentially based on several studies, including deep learning scholars at Google and researchers at Stanford University. It is a compilation of the agency's breakthroughs.

All OpenELM code is posted on GitHub along with extensive documentation about its training approach. Apple also details its efforts in a paper by Sachin Mehta and his team, “OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework,” posted on his arXiv preprint server. .

Apple researchers used a neural net with just 1.3 billion neural weights (parameters), suggesting the company is focusing on mobile devices. This number is far below the hundreds of billions of parameters used in models such as OpenAI's GPT-4 and Google's Gemini. Increasing the number of parameters directly increases the amount of memory required. Smaller neural networks may be easier to adapt to mobile devices.

OpenELM would be under the radar without its important contribution of efficiency. The researchers tuned the layers of the deep neural network so that the AI ​​model is more efficient than previous models in terms of the amount of data that needs to be computed when training the neural network. Specifically, it can meet or exceed the results of many neural networks for mobile computing “while requiring twice as many pre-training tokens.” Here, a token is an individual character, word, or sentence fragment in the training data.

Apple starts with the same approach as many LLMs: Transformers. Transformers are distinctive neural networks in language understanding, introduced by Google scientists in 2017. Since then, all major language models have employed transformers, including Google's BERT and OpenAI's GPT family of models.

Apple achieves high efficiency by fusing transformers with a technology called DeLighT, introduced in 2021 by researchers at the University of Washington, Facebook AI Research, and the Allen Institute for AI. This research departs from the traditional approach in which all neural weights are the same in all “layers” of the network, or successive mathematical calculations that the data passes through.

Instead, the researchers selectively tuned each layer to have a different number of parameters. Because some layers have relatively few parameters, they called their approach a “deep and lightweight transformer”, hence his name DeLighT.

“DeLighT matches or improves the performance of baseline transformers with 2-3 times fewer parameters on average,” the researchers said. Apple used his DeLighT and he created OpenELM. OpenELM has a distinct number of neural parameters in each layer of the neural network, and takes a non-uniform approach to parameters.

“Existing LLMs use the same configuration for each transformer layer in the model, resulting in uniform assignment of parameters across layers,” Mehta and his team wrote. “Unlike these models, each translayer in OpenELM has a different configuration (such as the number of heads and the dimensions of the feedforward network), and as a result, the number of parameters in each layer of the model varies.”

They write that the nonuniform approach “allows OpenELM to better utilize the available parameter budget and achieve higher accuracy.”

The competition Apple is trying to compete with is similar to MobiLlama from the Mohammed bin Zayed University of AI and its collaborators, and OLMo, introduced in February 2024 by researchers at the Allen Institute for Artificial Intelligence and academics at the University of Washington. small neural nets are used in Yale University, New York University, Carnegie Mellon University.

Apple's experiments are not conducted on mobile devices. Instead, the company uses Intel-based Ubuntu Linux workstations with a single Nvidia GPU.

In numerous benchmark tests, OpenELM achieves better scores despite being smaller in size or using fewer tokens. For example, in 6 out of 7 tests, OpenELM outperforms his OLMo despite having fewer parameters (1.08 billion vs. 1.18 billion). Also, the number of training tokens is only 1.5 trillion, compared to OLMo's 3 trillion.

Although OpenELM can produce more accurate results more efficiently, the authors noted further areas of research where OpenELM may take longer to generate predictions.

Apple may license AI technology for iOS 18 integration from Google, OpenAI, or other major AI companies, according to reports. Apple's investments in open source software raise the interesting possibility that the company may be looking to strengthen the open ecosystem that its devices benefit from.

