Google's new Infini-attention technology lets you input infinite text into LLM

Current large-scale language models (LLMs) are limited in the amount of information they can input before producing a result. Google announced a way to change this, allowing LLM to accept an infinite amount of text. This technique, called Infini-attention, works without sacrificing memory and computational power, creating more efficient and potentially impactful LLM results.

“Effective memory systems are important not only for understanding the long-term context of LLM, but also for reasoning, planning, continuously adapting new knowledge, and even learning how to learn,” the authors said in a statement. This is stated in a research paper.

Context windows play a central role in how LLM works, and as of this writing, all popular AI models, including OpenAI's GPT-4 and Anthropic's Claude 3, have finite context windows. For example, Claude 3 allows up to 200,000 tokens or alphanumeric characters in a single query. GPT-4 context windows allow 128,000 tokens.

Context windows are very important to LLM. The more tokens allowed in the context window, the more data the user can enter to produce the desired result. Therefore, LLM authors try to increase the number of tokens with each new iteration to make the model more efficient at learning, understanding, and delivering results.

However, to do so, technology companies must address memory and computing requirements. For every doubling of LLM's context window, the memory and computational requirements increase by a factor of four, the Google researchers wrote. Every increase in memory and computing power is, of course, not only resource intensive, but also very expensive.

Google's Infini-attention solves this problem using existing memory and compute requirements. When the researchers entered additional details into the context window beyond the limits of the model they tested, all data up to the limit was transferred to so-called “compressed memory” and removed from active memory. This memory is then freed for additional context. Once all the data was input, the model was able to combine all inputs in compressed and active memory to return a response. The technology allows for “natural extension of existing LLMs to infinitely long contexts through continuous pre-training and fine-tuning,” the researchers wrote.

With the ability to incorporate as much context into the model as desired, the researchers compared the Infini-attention technique to existing LLMs and found that option to be superior. “Our approach naturally scales to the 1 million length domain of input sequences, while outperforming baselines on long context language modeling benchmarks and book summarization tasks,” the study says. they wrote.

The researchers did not share data or prove that their method actually performs better than existing models. However, if we can eliminate the limitations of the context window, it stands to reason that models equipped with this technique should perform better than those with limitations.

Google's technology could pave the way to dramatically improve the performance of LLM, enabling companies to create new applications and generate additional insights. But for now, Infini-attention is purely research. It is unclear whether this technology will be introduced into widely available LLMs.




