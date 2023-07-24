



Last week, Meta announced Llama 2, a new large-scale language model with up to 70 billion parameters. The new generative AI system presents a spectacular shot across the bow of OpenAI, but few details have been shared about most AI models, including GPT-3/3.5 and GPT-4. According to Wikipedia, Llama 2’s release includes his 40% of ChatGPT 3.5’s parameters and includes a prominent partnership with Microsoft. Also, Redmond isn’t just a nominal partner, he recently announced support for his Llama 2 on Azure and Windows. Qualcomm, meanwhile, has said it’s now entering the LLM race with Llama 2, revealing plans to bring Llama 2 to smartphones.

A little more controversial is Metas’ claim that Llama 2 is open source. Meta and Microsoft are indeed promoting the new Llamas open source credentials. (On the other hand, some open source developers argue otherwise.)

Licensing Llama 2 doubles the power, giving developers and researchers the opportunity to tailor the model to their specific needs.

Last week’s development dramatically expanded the capabilities and scope of open source AI models, regardless of their source.

Oh, this is so much better, says Aravind Srinivas, co-founder and CEO of Perplexity.ai. Whether it matches GPT 3.5 1690221430it’s just a matter of time.

Llama 2: Fine-tuned and ready to chat

Perplexity.ai offers impressive free online demos of several Llama 2 models. The result will compete with today’s top chatbots such as ChatGPT and Google Bard. Llama 2 won’t likely win an award, but it quickly produces clean, natural text that’s easy to read and understand. Llama 2 can also generate commonly understood facts, generate code, and solve mathematical equations.

Llama 2, like other LLMs, can sometimes produce inaccurate or unusable answers, but the Metas paper introducing Llama 2 claims that it is on par with OpenAIs GPT 3.5 on academic benchmarks such as MMLU (which measures LLM knowledge across 57 STEM subjects) and GSM8K (which measures LLM math comprehension).

Most of the smaller models outperforming Llama 2 on the Open LLM leaderboard are themselves based on Metas’ previous model, Llama.

The Metas researchers achieved this in part due to the sheer size of the model, but that’s only half the story. Llama 2 uses supervised fine-tuning, reinforcement learning with human feedback, and a new technique called ghost attention (GAtt) that, according to the Metas paper, allows interactive control over multiple turns. More simply, GAtt helps Rama 2 to produce desirable results when asked to work within certain constraints that may arise when asked to act as a historical figure or to produce a response within the context of a particular topic such as architecture.

Llama 2’s Ghost Attendance helps models deliver conversational outcomes that fit user-defined constraints, according to Llams proponents.meta

These techniques help Llama 2 offer a range of models with solid benchmark performance for their size. The largest model, Llama 2 70B (70 billion parameters), has the best performance on all benchmarks, but Meta also offers Llama 2 7B and Llama 2 13B.

The less parameterized variant does not perform as well as the Llama 2 70B, but is compact enough to run locally on less powerful devices such as smartphones. Qualcomm, a leading manufacturer of smartphone system-on-chips (SoCs), has announced a partnership with Meta to allow him to run Llama 2 locally on Qualcomm-powered smartphones from 2024.

Qualcomm marketing communications specialist Rodrigo Caruso Neves do Amaral said software tools can be used to compile and optimize models to run specifically on Hexagon processors. The amount of energy saved by running on devices has a significant impact on both the businesses running these models and the consumers who must pay to access these applications.

Open source fits where the closed model fails

Running large language models offline on smartphones cannot be handled by closed AI models (such as OpenAI’s GPT 3.5 or Google’s PaLM2). This is not necessarily due to technical limitations (perhaps OpenAI and Google could offer models suitable for smartphones), but rather philosophical differences. OpenAI and Google offer LLM as an API. Internet connectivity is required to access the API, and customers are billed for usage.

Llama 2, by contrast, was released with a license allowing unrestricted and free commercial and academic use. This license contains clauses requiring permission to use Llama 2 in products or services with more than 700 million monthly active users, so it does not meet all the criteria set by the Open Source Initiative. However, this clause only pertains to Metas’ biggest competitors, such as OpenAI and Google. The Metas Llama 2 model is already on the HuggingFaces Open LLM leaderboard, with llama-2-70b-chat-hf showing his third best performance in latency and throughput benchmarks as of the end of Monday, July 24th. (AI developers are rapidly exploiting the potential of Llamas 2. Stability AIs FreeWilly2, the current top model at the time of writing, is actually already based on Llama 2, but FreeWilly2 uses a different dataset to fine-tune its model.)

As of July 21, AI aggregator HuggingFaces OpenLLM Leaderboard shows llama-2-70b-chat-hf as the second best performer among all open LLMs in performance and latency metrics.hugging face

Srinivas sees Llama 2’s open-source license as an additional power to give developers and researchers the opportunity to tailor the model to their specific needs. One can start his fork of Llama 2, which focuses on quantization, and another can start his fork of Llama, which focuses on low-rank tweaks. [] Another person can do the work of extracting the larger model into smaller models. Progress will only accelerate.

This is especially important for developers targeting edge devices such as smartphones. Given the model’s size, the fact that the Llama 2 70B performs better isn’t all that surprising. But the smaller model of Llama 2 also ranks high relative to the size of the model. And most of the smaller models that perform better than Llama 2 on the Open LLM leaderboard are themselves based on Metas’ previous model, Llama. This suggests that Llama 2 will continue to climb the charts as the open source community developer applies his talents to his Llama 2.

I think so [Llama 2 7B and Llama 2 13B] Excited already. …is this just the beginning? [Meta] Publishing it allows people to improve it, Srinivas said. You can build other frameworks and other layers of engineering, which gives everyone more power.

