



In preparation for a time when artificial intelligence is so powerful that it could pose a serious and imminent threat to people, Google DeepMind announced on Friday that it will be able to peer within its AI models to determine if they are approaching dangerous capabilities. Framework released.

The paper published Friday describes a process by which DeepMinds models are reevaluated every time the computational power used to train the model increases by a factor of six, or after each three-month period of fine-tuning. In between evaluations, early warning evaluations will be designed.

According to a statement shared exclusively with Semafor, DeepMind plans to work with other companies, academia, and lawmakers to improve the framework. We plan to begin implementing auditing tools by 2025.

Currently, evaluating powerful, state-of-the-art AI models is an ad-hoc process that constantly evolves as researchers develop new techniques. Red teams spend weeks or months testing, trying different prompts that can circumvent safety measures. Companies then deploy techniques ranging from reinforcement learning to special prompts to bring the model into compliance.

This approach is effective because it is not powerful enough to pose too much of a threat to current models, but the researchers believe a more robust process is needed as models gain functionality. As this situation evolves, critics worry that by the time people realize that technology has gone too far, it will be too late.

The Frontier Safety Framework released by DeepMind attempts to address this issue. This is one of several methods announced by major technology companies such as Meta, OpenAI, and Microsoft to alleviate concerns about AI.

These risks are out of reach under the current model, but the company hopes the introduction and improvements to the framework will help it prepare to address them, it said.

