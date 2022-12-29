



Researchers at Google and Deepmind are experimenting with large-scale language models to answer common medical questions. Med-PaLM produces scientifically correct answers at the level of human experts.

The research team relies on PaLM, Google’s large-scale language model with 540 billion parameters, about three times that of GPT-3. According to Google, PaLM outperforms his GPT-3 on difficult language and code tasks and forms the language part of the company’s Pathways vision. PaLM stands for Pathways Language Model.

Tuning to a medical language model with instruction prompts

For medical variants of PaLM, the research team developed a new rapid method to tailor the Flan-PaLM variant to the medical field. Flan-PaLM is a variant of his PaLM tweaked with task (dialog, FAQ, reasoning, etc.) instructions that Google Brain introduced in his October.

Instead of fine-tuning PaLM using increasingly complex medical data, the research team used a combination of soft prompts learned during prompt tuning and a small amount of medical data to optimize for specific medical responses. I used a human-generated prompt. In the latter prompt, the research team collaborated with her four clinicians in the US and UK.

The research team adapted Flan-PaLM to the medical field using instructions and examples from a panel of qualified clinicians. | | Image: Google

The researchers named this combination of learned and programmed prompts “instruction prompt tuning.” The new method is “data and parameter efficient,” the team wrote.

To the best of our knowledge, our example is the first published example of learning prefixed soft prompts before full hard prompts mixed with instructions and a few examples.

from the paper

According to the research team, the Med-PaLM model derived from Instruction Prompt Tuning is significantly superior to the unadjusted Flan-PaLM model in terms of medical response, achieving “promising results”, but clinical It’s not as good as a doctor’s performance.

Looking at the results, this conclusion is correct, but it seems like an understatement. Med-PaLM performs on par with professionals on almost all tests. Response quality was also assessed by the clinician.

Unlike Flan-PaLM, which is not optimized on medical data, Med-PaLM achieves expert-level results in answering medical questions. | | Image: Google

Med-PaLM also significantly reduced potentially adverse responses. In the Flan-PaLM, 29.7% of responses had potential health hazards. It was only 5.9% in Med-PaLM compared to 5.7% in human experts. Again, the medical language model performs on par with humans.

Examples of answers generated by Med-PaLM to medical questions from the public. | | Image: Google

Med-PaLM significantly outperformed Flan-PaLM, although human expert responses were rated more helpful when rated by laymen. Both language models answered the question.

recommendation

When evaluated by the layperson, the medical-optimized Med-PaLM provides a much more useful answer than the non-medical-optimized Flan-PaLM. | | Image: Google Language Models Can Help Medical Professionals

The strong performance of Med-PaLM on medical problems may be a novel feature of the language model, the researchers conclude. This is because the model performance was scaled for different numbers of PaLM model parameters (8 to 540 billion).

However, scaling alone is not sufficient to achieve high response confidence, as evidenced by the relatively weak performance of Flan’s PaLM model. This is where the newly introduced Instruction Prompt Tuning comes into play.

The research team determined that 92.6% of the Med-PaLM responses were in line with scientific consensus. Clinician responses were 92.9% for him, but only 61.9% for Flan-PaLM. This indicates that imperative-prompt tuning is a suitable alignment technique for generating scientifically correct responses, the team writes.

The Med-PaLM results provide a data- and parameter-efficient alignment technique that helps improve factors related to accuracy, factuality, consistency, safety, harm, and bias through instruction-prompted tuning, Shows how to bridge the gap with clinical professionals and bring them to life. A model close to real-world clinical applications.

from the paper

The rise of basic AI models is a “significant opportunity” to rethink how medical AI is developed and to use it “easier, safer and fairer,” the researchers wrote. increase. They see their work as a driving force for further exchanges.

The research team is implementing MutliMedQA to complement Med-PaLM. MutliMedQA is a benchmark that combines six existing open-ended datasets for answering questions in the fields of health screening, research, and consumer inquiry. I searched online.

