Microsoft's new safety system can catch hallucinations in customers' AI apps

Sarah Bird, Microsoft's Chief Product Officer for Responsible AI, said in an interview with The Verge that her team believes that for Azure customers who don't employ a group of red teams to test the AI ​​services they build, He said he has designed several new safety features that are easy to use. According to Microsoft, these LLM-powered tools will detect potential vulnerabilities and detect plausible but unsupported hallucinations for Azure AI customers using models hosted on the platform. It can monitor and block malicious prompts in real time.

We recognize that not all of our customers have deep expertise in prompt injection attacks or hate content, so our rating system generates the prompts necessary to simulate these types of attacks. To do. Customers can then get a score and see their results, she says.

Three features: Prompt shield. Block malicious prompts from prompt injections or external documents that tell the model to go against its training. Ground detection. Detect and block hallucinations. Safety assessment, which assesses model vulnerabilities, is now available in preview in Azure AI. Two other features are coming soon: the ability to direct models to safe outputs, and track prompts to flag potentially problematic users.

Regardless of whether the user is typing the prompt or the model is processing third-party data, the monitoring system evaluates the prompt to see if it triggers a forbidden word or if there are any hidden prompts. , and then decide whether to send the prompt to the model to answer. The system then looks at the model's response to see if the model hallucinated information that was not in the document or prompt.

In the case of Google Gemini images, filters created to reduce bias had unintended effects, something Microsoft says Azure AI tools will allow for more customized control. Bird admits there are concerns that Microsoft and other companies may be deciding what is and isn't appropriate for his AI models. So her team added a way for Azure customers to toggle filtering of hate speech and violence that the model recognizes and blocks.

In the future, Azure users will also be able to get reports of users attempting to trigger unsafe output. Bird said this allows system administrators to understand which users are their company's red team and which users are potentially more malicious.

Bird says the safety feature will soon be added to other popular models such as GPT-4 and Llama 2. However, Azure models in his garden include many AI models, so users of smaller, less used open source systems may need to manually specify safety features. . models.




