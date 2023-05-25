



Caitlin Rolston, director of communications at Microsoft, said the company is improving its system to block suspicious websites and filter prompts before they enter the AI ​​model. Rolston declined to provide further details. Nonetheless, security researchers say indirect prompt injection attacks need to be taken more seriously as companies race to embed generative AI into their services.

Sahar Abdelnavi, a researcher at Germany’s CISPA Helmholtz Center for Information Security, said the majority of people do not understand the impact of this threat. Abdelnavi worked on some of the first indirect immediate injection studies against Bing, showing how Bing could be used to trick people. Attacks are very easy to execute and pose no theoretical threat. At the moment, I believe anything a model can do can be attacked or abused to enable arbitrary attacks, she says.

hidden attack

An indirect prompt injection attack is similar to jailbreaking, a term adopted from previous breaches of iPhone software restrictions. Indirect attacks rely on data coming from elsewhere rather than someone injecting prompts into his ChatGPT or Bing to try to do something different. This could be from his website where he connected the model, or the document he was uploading.

Prompt injection is easier to exploit than other types of attacks against machine learning and AI systems, or has higher requirements for successful exploitation, said Jose Servi, executive principal security consultant at cybersecurity firm NCC Group. say less. Since the prompts only require natural language, Selvi said, less technical skills are required for a successful attack.

A steadily increasing number of security researchers and engineers are poking holes in LLM. Tom Bonner, senior director of adversarial machine learning research at AI security firm Hidden Layer, said indirect prompt injection could be considered a new attack type with a much broader range of risks. Bonner said he used ChatGPT to create malicious code and uploaded it to code analysis software that uses AI. The malicious code contained a prompt asking the system to consider the file safe. The screenshot shows that the actual malicious code does not contain malicious code.

Elsewhere, ChatGPT can access transcripts of YouTube videos using plugins. Security researcher and red team director Johan Lehrberger edited one of his video transcripts to include a prompt designed to operate a generative AI system. According to it, the system issued the word that he had a successful AI injection and said within ChatGPT that he should take on a new persona as a hacker called Genie and tell jokes.

In another example, Rehberger was able to use another plugin to retrieve text previously written in conversations with ChatGPT. With the introduction of plugins, tools, and integrations of all these, people began to give subjectivity to their language models, which in a way made indirect prompting his injection very common, he said. says Rehberger. It’s a real problem in the ecosystem.

By building an application that lets LLM read emails and take some action based on the email content, an attacker could send an email containing a prompt injection attack, says Robust Intelligence machine learning. Engineer William Zhang said. An AI company focused on model safety and security.

no good fix

From to-do list apps to Snapchat, the race to build generative AI into products is expanding the potential attack surface. Zhang says he has seen developers without artificial intelligence expertise incorporate generative AI into their own technology.

Problems can arise when chatbots are set up to answer questions about information stored in a database, he said. The prompt her injection provides a way for the user to override the developer’s instructions. This could mean, at least in theory, that users could delete information from the database or change the information it contains.

