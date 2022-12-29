



With the release of large language models like GPT-3 and PaLM, big techs have been experimenting with them for quite some time. Google also recently joined in as an open AI response to his ChatGPT called MultiMediaQA, but specifically for answering medical questions.

Introducing MultiMedQA

ChatGPT seems ubiquitous with no real use cases, but Google Research and DeepMind recently introduced MultiMedQA, an open-source large-scale language model for medical purposes. It combines HealthSearchQA, a new open-ended dataset of medical questions searched online, with six existing public question-and-answer datasets covering professional health examinations, research, and consumer questions. Thing.

The model also incorporates a methodology for evaluating human model responses along several axes, including factuality, accuracy, potential harm, and bias.

MultiMedQA provides a dataset of multiple-choice questions and a dataset for longer answers to questions posed by health professionals and non-professionals. These consist of the MedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU clinical topic datasets. Additionally, to improve MultiMedQA, a new dataset of curated and frequently searched medical inquiries called HealthSearchQA has been added.

The HealthsearchQA dataset, consisting of 3375 frequently asked questions from consumers, was curated using seed medical diagnoses and their associated symptoms. All users who entered the seed phrase were shown public FAQs retrieved using the seed data and created by the search engine.

PaLM to the rescue

The researchers developed this model with a 540 billion parameter LLM, PaLM, and its instruction-adjusted variation, Flan-PaLM, to evaluate LLM using MultiMedQA.

Flan-PaLM delivers SOTA performance on MedQA, MedMCQA, PubMedQA, and MMLU clinical topics. It often outperforms many strong LLM baselines by a combination of few shots, chain of thought (CoT), and self-consistent prompting techniques. FLAN-PaLM performs over 17% better than the previous SOTA on his MedQA dataset of USMLE questions. However, human evaluation reveals significant gaps in Flan-PaLM responses.

The resulting model that addresses this issue is Med-PaLM. It claims to perform better compared to Flan-PaLM, but still needs to surpass human medical expert judgment.

For example, a group of physicians determined that 92.6% of Med-PaLM responses were equivalent to clinician-generated responses (92.9%), whereas only 61.9% of long-form Flan-PaLM responses were in agreement. was considered to be in scientific agreement. Furthermore, similar to the Flan-PaLM, 5.8% of the Med-PaLM responses were rated as potentially contributing to a negative outcome comparable to clinician-generated responses (6.5%), indicating that the Flan-PaLM 29.7% of respondents said yes.

Read the full text here.

Google Health Play

At the Google for India 2022 event, Google announced a collaboration with India’s Apollo Hospitals to improve the use of deep learning models for X-rays and other diagnostic purposes. Google’s other medical partnerships include Aravind Eye Care System, Ascension, Mayo Clinic, Rajavithi Hospital, Northwestern Medicine, Sankara Nethralaya and Stanford Medicine.

Google isn’t the first to enter AI-powered healthcare solutions. Microsoft has also worked closely with his OpenAI team to adopt his GPT-3, fostering collaboration between employees and clinicians and improving the efficiency of medical teams.

In November 2022, Meta AI also introduced Galactica. It is an AI generator that claims to support academic researchers by generating comprehensive literature reviews and wiki entries on any subject. However, it failed due to unreliable results.

Around the same time, Meta AI released CICERO, a fusion of natural language processing and strategic reasoning. This is the first AI agent to run at human level in Diplomacy, a complex natural language game. An AI agent playing against a human on the website showed this SOTA performance of his, beating all other players’ average scores by more than 2 to 1. Additionally, they were among the top 10% of players who participated in multiple games.

