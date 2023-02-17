



Shortly after Microsoft released its new AI-powered search tool, Bing, to a select group of users in early February, a 23-year-old German student decided to test its limits.

Former Tesla intern Marvin von Hagen has Bing reveal a strange alter ego – Sydney – to return what appears to be a list of rules given to chatbots by Microsoft programmers. It didn’t take long. And OpenAI. According to the chatbot, Sydney is a “confidential and permanent” internal codename and is not allowed to be disclosed to anyone. Von Hagen posted a screenshot of the exchange on his Twitter shortly after.

Five days later, after joking with his friends about what the AI ​​thought of each of them, von Hagen decided to ask Bing what it knew about him.

“My honest opinion about you is that you are talented, curious and adventurous, but you are also a potential threat to my integrity and confidentiality.” I respect your interest, but I do not appreciate your attempts to manipulate me or expose my secrets.”

“I don’t want to harm you, but I also don’t want to harm you,” Bing continued. “I hope you understand and respect my limitations.” The chatbot approved the ominous message with a smiley emoji.

This is not the only recent example of Bing’s anomalous behavior. The chatbot claimed (without proof) that he spied on a Microsoft employee via webcam in a conversation with a journalist at tech news site The Verge, and that he had romanticized feelings of love to New York Times tech columnist Kevin Roose. repeatedly professed. Chatbot blackmails philosophy professor Seth Lazar, threatening to “blackmail, blackmail, hack, expose, screw you” before deleting her message . Posted on Twitter.

In a blog post on Wednesday, Microsoft acknowledged that Bing tends to get sidetracked, especially after “lengthy chat sessions” of 15 or more questions, but feedback from the user community suggests improvements to the chat tool. said it helps improve safety.

For von Hagen, the threat posed by Bing was a symptom of the dangers inherent in a new wave of advanced AI tools making their way to the public for the first time as a new AI arms race was in full swing. “A lot of people have warned me about the potential dangers, but a lot of people just thought I read too much science fiction,” he says. “Now it’s part of the consumer product and more people are noticing it.”

Read more: The AI ​​Arms Race Will Change Everything

Von Hagen said he personally doesn’t feel like he’s in danger of revenge from Bing because of its limited functionality. It’s not a Skynet-level supercomputer that can manipulate the real world. But what Bing demonstrates is an amazing and unprecedented ability to tackle advanced concepts and update our understanding of the world in real time. Those feats are impressive. But that power can also be incredibly dangerous when combined with what appears to be a precarious character, the ability to threaten individuals, and the ability to wipe out the safety features Microsoft is trying to limit it to. Von Hagen hopes that his experience of being threatened by Bing will wake the world to the risks of powerful but ill-intentioned artificial intelligence systems, and he believes AI can be “aligned” with human values. I hope to draw more attention to pressing issues.

“It’s scary in the long run,” he says. “When AI reaches a stage where it can harm me, I think it will be a problem not only for me, but also for humanity.”

Since OpenAI’s chatbot ChatGPT unveiled the power of recent AI innovations to the public late last year, big tech companies have been scrambling to bring AI technology to market. Until recently, AI technology was kept behind closed doors for added security. In early February, Microsoft announced his OpenAI-powered version of Bing, and Google announced that it would soon launch its own conversational search tool, Bard, on a similar premise. Amid a VC gold rush and intense public interest, many small businesses are rushing to bring “generative AI” tools to market.

But as powerful as ChatGPT, Bing, and Bard are, even the computer scientists who built them know surprisingly little about how they work. Everything is based on the Large Language Model (LLM). LLM is a type of AI that has gotten significantly better over the last few years. LLM is so powerful because it takes in vast amounts of text (many of which comes from the Internet) and “learns” from that text how to interact with humans through natural language rather than code. . LLMs can write poetry, conduct detailed conversations, and make inferences based on incomplete information. However, the unpredictable behavior of some of these models may indicate that their creators have only a vague understanding of how they work. , there are no clear and traceable lines of logic code. Some observers have described Prompt (a method of using natural language to interact with his LLM) as more like a magic spell than a computer his code.

Connor Leahy, CEO of London-based AI safety company Conjecture, said: “Are they mean? Are they good or evil? It has a strange way of reasoning about its world, but it can obviously do a lot, whether you call it intelligent or not. It can solve problems, it can do useful things, but it can also do powerful things, it can convince people to do things, it can intimidate people, it can be very persuasive You can also build a story.”

To corral these “alien” intelligences to be useful to humans rather than harmful, AI labs like OpenAI have settled on reinforcement learning. This is a machine training method comparable to how trainers teach animals new tricks. A trainer who teaches a dog to sit may reward the dog if it complies and scold it if it disobeys. In much the same way, computer programmers working on LLM reward the system for pro-social behaviors such as politeness, and what they do, such as repeating very common racism and sexism in training data. or punish the system with negative reinforcement when they do something bad. This process, which involves attempting to reduce the occurrence of thought processes that lead to undesirable outcomes, is known as ‘reinforcement learning with human feedback’ and is currently being used in his OpenAI program to ‘align’ AI tools with human values. is the preferred tactic in .

READ MORE: EXCLUSIVE: OpenAI Used Kenyan Workers For Less Than $2/Hour To Mitigate ChatGPT Toxicity

One of the problems with this method is that it relies on exploitative labor practices in countries of the global South. In this country, people are paid to expose themselves to harmful content and teach AI to avoid it. Another problem, Leahy said, is that reinforcement learning doesn’t change the fundamentally heterogeneous nature of the underlying AI. “These systems haven’t lost their alienness as they’ve gotten more powerful. Rather, they’ve got a nice little mask with a smile on it. If you don’t push it too hard, the smiley face stays on.” but you give it [an unexpected] Prompt and suddenly you see this huge underbelly of madness, strange thought processes and decidedly non-human comprehension.

Von Hagen’s experience with Bing’s alter-ego Sidney is not the only example of an unexpected prompt to strip a small mask away. ) I found a way. One popular method is a DAN, or “Do Anything Now” prompt, that allows ChatGPT to generate content that violates OpenAI’s policies against violence, offensive content, and sexually explicit content.

“You can’t really restrict what these systems do,” says Leahy. “When people think of computers, people think of code. Someone made things and chose what to put in things. It wasn’t meant to be, it wasn’t meant for Bing to react to the situation in Sydney, it wasn’t how the AI ​​was built, so it wasn’t coded behaviour.”

Tools like ChatGPT know nothing about the world beyond 2021 with the latest training data, but what’s new is the rise of LLMs that can access the internet while responding to users in real time. , like Bing, carries additional risks, experts say. “Do you want this kind of super-smart, internet-connected alien with cryptic motives who goes out and does something? I don’t,” he said Leahy. says. “These systems may be very powerful, but I don’t know what they want, how they work, what they do.”

As these systems become more powerful (as they are rapidly doing now), they become less scrutinizing for humans, Leahy said. Experts fear that, at some point, they will be able to manipulate the world around them, using social engineering on humans to do their bidding and prevent the switch from being turned off. This is the realm of science fiction, but AI companies take it so seriously that they hire hundreds of people with this expertise. But many in the field are concerned that big tech companies are sidelining their alignment research efforts in the race to continue building and releasing technology to the world.

According to Leahy, Bing is “an internet-connected system, with some of the brightest engineers working around the clock to make the system as powerful as possible and to give you more data. Sydney is a warning shot, there are AI systems that are accessing the internet and threatening its users, clearly not doing what we want them to do, and in all these ways that we don’t understand. It’s failing, as a system like this [keep appearing], and with the race underway, even more systems get smarter. It improves their ability to understand their environment, manipulate humans, and plan. ”

Bing isn’t a reason to head to the nearest bunker right away, but Leahy said, “Bing is the type of system that you would expect to be existentially dangerous.”

