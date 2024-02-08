



To enhance the inference capabilities of large-scale language models (LLMs), researchers at Google Deepmind and the University of Southern California have proposed a new self-discovery framework.

The approach, published this morning on arXiV and Hugging Face, goes beyond existing prompting techniques used in LLM and can improve the performance of known models such as OpenAI's GPT-4 and Google's PaLM 2. It turns out it can be done.

Self-discover significantly improved the performance of GPT-4 and PaLM 2 on difficult inference benchmarks such as BigBench-Hard, Grounded Agent Inference, and MATH by 32% compared to Chain of Thought (CoT) , the researchers said in their paper.

This framework revolves around the reasoning structure inherent in LLM self-discovery tasks to solve problems. This model considers multiple atomic reasoning modules, such as critical thinking and step-by-step thinking, and assembles them into explicit reasoning structures for LLMs to follow during decoding.

More interestingly, this approach works with 10 to 40 times fewer inference calculations, which is great for enterprises.

Self-discover unique structures

LLMs have evolved to handle a large number of tasks thanks to their ability to follow instructions, reason, and generate consistent responses. To achieve this, the transformer architecture model uses a variety of prompting techniques inspired by cognitive theories about how humans reason and solve problems. This includes few-shot and zero-shot, inspired by how to solve a problem step-by-step, decomposition prompts on how to break a problem into multiple sub-problems, and step-back prompts on how to look back on a problem. It includes a chain of thought. The nature of the task of establishing general principles.

All of these methods work, especially thought chains, but they all work by making implicit, prior assumptions about how to approach a particular task. The researchers argue that this approach may not be the best, as each task has its own unique structure, and one particular technique may be better at solving it than another. I am.

In their latest study, Deepmind and USC researchers have self-discovered this unique underlying structure and proposed a general prompting framework that is efficient while choosing the right reasoning technique for the task.

Self-discovery is inspired by the way humans internally devise reasoning programs to solve problems. From a set of atomic reasoning modules written in natural language, such as decomposition into subtasks, critical thinking, LLM, and unlabeled task examples, construct a consistent task-specific reasoning structure (Stage 1), and then construct the next instance solve. Task using the discovered structure (Stage2). Stage 1 operates at the task level and uses three actions to guide the LLM to generate the inference structure for the task. During stage 2, during final decoding, the LLM simply follows the self-discovered structure to arrive at the final answer, the researchers explained.

Notable performance improvements for known LLMs

To see how the new approach works, the researchers used multiple models including GPT-4 and PaLM 2-L, including Big Bench Hard, Think to Do, and Mathematics. We tested it on 25 reasoning tasks. We found that self-discovery outperformed thought chain reasoning and other techniques in 21 out of 25 tasks, increasing performance by up to 32%. The researchers also found that it is more efficient, as it requires 10 to 40 times less inference computation.

According to the data shared in the paper, when using GPT-4, the self-discovery approach achieved accuracy results of 81%, 85%, and 73% on the Big-Bench Hard, Thinking for Doing, and Math tasks, respectively. did. However, when using chain of thought, the results dropped to 75%, 52%, and 71%, respectively. Similar gaps were observed when compared to the plan-and-solve approach.

On the other hand, PaLM 2-L achieved results with accuracy of 67%, 69%, and 50.5% across the three tasks. This is lower than GPT-4, but still much better than what was achieved with thought chain (60%, 40%, 42%) and plan-solution approaches (61%, 42%, 49%) . .

Improved inference is the key to AI success

Although the idea of ​​a framework to encourage self-discovery has only just been proposed, it has the potential to push the boundaries of problem solving, give LLMs the ability to tackle difficult problems with ease, and ultimately bring them closer to the goal of general intelligence. there is. Remarkably, transferability studies conducted by researchers show that the constructed inference structure is universally applicable across model families and shares commonalities with human inference patterns. It shows that.

Looking to the future, the team added that they are excited to further explore LLM structured reasoning to push the boundaries of problem-solving and discover the potential of human-AI collaboration.

