



This week at Google's Mountain View headquarters, a man in a rainbow-colored gown emerged from a giant coffee cup in a vibrant but somewhat surreal demonstration of the company's latest achievements in generative AI.

At the I/O event, electronic musician and YouTuber Marc Rebillet played around with an AI music tool that can generate synced tracks based on prompts like viola or 808 hip-hop beats. He told the developer that the AI ​​had figured out a way to fill in the sparse elements of my loop. He says it's like having a weird friend who says let's try this, let's try that.

Rebillet was talking about AI assistants, personalized bots that help you work, create, communicate better, and interface with the digital world on your behalf. This new class of products was highlighted this week amid a flurry of new AI developments by Google, its AI arm DeepMind, and Microsoft-backed OpenAI.

The companies simultaneously announced an upgraded set of multimodal AI tools. This means you can interpret audio, video, images, and code in a single interface, as well as perform complex tasks like live translation and family vacation planning.

In a video demonstration, Google's prototype AI assistant Astra, powered by a Gemini model, responded to voice commands based on an analysis of what it saw through a phone's camera or when using smart glasses.

It was able to identify sequences of codes, suggest improvements to electrical diagrams, recognize London's King's Cross district through a camera lens, and remind users where they forgot their glasses.

Musician and YouTuber Marc Rebillet introduced Google's MusicFX AI generation tool at the tech company's I/O event this week Google

Meanwhile, at OpenAI's product launch on Monday, Chief Technology Officer Mira Murati and her colleagues showed how the company's new AI model, GPT4o, performs speech translation in live conversations, and similarly uses anthropomorphic tones and We demonstrated how to use speech to parse text. Images, videos and code. This is extremely important as we look to the future of interactions between ourselves and machines, Murati told the FT.

AI-powered smart assistants have been around for nearly a decade, but these latest advances offer smoother, faster voice interactions and thanks to large language models (LLMs) that power new AI models. A superior level of understanding is possible. Now, a new scramble is underway among technology groups to bring so-called AI agents to consumers.

Google CEO Sundar Pichai said this week that these are best understood as intelligent systems, capable of reasoning, planning, remembering, thinking many steps ahead, and working together across software and systems. , said it can accomplish something on behalf of the user.

Like Google and OpenAI, Apple is also expected to be a major player in this race. Industry insiders predict that a major upgrade to Apple's voice assistant, Siri, is on the horizon. The company is rolling out new AI chips designed in-house that can run generative models on devices.

Meanwhile, Meta already launched its AI assistant on its platforms Facebook, Instagram, and WhatsApp in more than a dozen countries in April. Startups like Rabbit and Humane are also trying to get into this space by designing products that act as standalone AI helpers.

Analysts note that this week's big announcements remain largely vaporware concepts rather than actual products, but AI assistants and agents are key to bringing the latest AI technology to the masses. is clear to industry watchers.

OpenAI Chief Technology Officer Mira Murati says the company's new AI model, GPT4o, points to the future of interactions between us and machines Philip Pacheco/Bloomberg

There is no doubt about it, but now is the time for personal matters. [artificial] “We're excited to be working with Microsoft AI,” said Mustafa Suleiman, CEO of Microsoft AI, who was not involved in either of this week's releases. Suleiman previously founded Inflection, a startup building a consumer AI assistant known as Pi, but left the company in March.

Silicon Valley has always framed technology as functional utilities for doing things efficiently and quickly. But surprisingly, he says, these tools are now in the creative realm of product manufacturers. This technology is mature enough that new types of clays are being developed that we can all use and invent, and it is now becoming a reality.

For nearly a decade, technology groups have been racing to bring AI to consumers through virtual assistants like Apple's Siri, Microsoft's Cortana, and Amazon's Alexa. Virtual assistants are now built into a variety of devices.

Google, for example, unveiled its AI assistant in 2016, and Pichai painted a picture of a post-smartphone world where intelligence is embedded in everything from speakers to glasses.

But eight years later, smartphones are still consumers' primary interface to the web. A major challenge to mass adoption is slow response times from AI agents and errors in understanding and executing human instructions and needs.

The introduction of core chatbot technologies such as ChatGPT, Gemini, and Claude, known as Transformers, in 2017 significantly improved the technology behind AI assistants, such as natural language processing.

But when it comes to developing AI assistants that the public wants to use, the killer feature is speed, according to Ben Thompson, a technology analyst and author of the influential industry newsletter Stratechery. .

Virtual assistants like Amazon Alexa are now built into a variety of devices David Ryder/Bloomberg

The fun is when you push the boundaries of speed and latency. The joy and playfulness of receiving immediate feedback is so different from sitting and waiting…it's like a parlor trick, he said this week on the Sharp Tech podcast. .

Thompson said he noticed this in the context of Google and its AI search mode, known as Search Generative Experience. This mode provides his AI-generated answers to your queries alongside the traditional list of links.

I'm using it more and frankly, even unintentionally, using ChatGPT less because it's so much faster and more consistent, he said. Google knows better than anyone that engagement can change in milliseconds.

But OpenAI's flagship bot is no slouch. A version of the GPT4o model was able to smoothly translate between Italian and English in real-time conversations. The model also displayed a conversational, if slightly flirtatious, tone when talking to the male engineer on stage. Thompson said that what OpenAI will really improve is the user experience and his actual ChatGPT product.That's what it takes to win in the consumer space [technology]much larger than Enterprise.

But waiting in the wings is Apple. Investors are eager to learn more about the company's AI plans, as its stock has fallen this year compared to Alphabet and Amazon.

This week, OpenAI announced a deal with Apple to develop desktop apps for the Mac. The iPhone maker will also explore further potential partnerships with both OpenAI and Google Gemini, hiring experts and publishing research papers that provide valuable insight into the behind-the-scenes work of building AI models. It is said that it does.

A version of OpenAIs GPT4o model could fluidly translate between Italian and English in real-time conversations Jaap Arriens/NurPhoto/Getty Images

Insiders say Apple's advantage lies in its huge existing user base, with more than 2.2 billion active devices worldwide, and how it helps people integrate generative tools like virtual assistants into their daily lives. He is said to be in a position to steer the process of integrating the two companies.

Apple will likely partner with OpenAI to build next-level Siri technology, predicts Wedbush analyst Dan Ives. Assistant, which can perform complex tasks for iPhone users, could eventually become a paid subscription service, he said, similar to how the company currently monetizes other services such as iCloud. stated in the memo in the following manner.

After Monday's OpenAI demo, Bank of America analysts reiterated their Buy rating on Apple stock, highlighting the potential virtual assistant and AI capabilities bring to app developers in the company's App Store ecosystem. He said that Apple has already made a profit of $6 billion to $7 billion from 2020. Sensor Tower estimates that fees will be charged quarterly.

Distribution and brand are important. Apple and Google have a big advantage in that sense.

However, Google's Edge is part of the company's suite of consumer apps, from email to calendar tools, into which AI agents can be integrated.

We have always wanted to build universal agents that are useful in everyday life. Our efforts to make this vision a reality date back many years.that's why we created [the chatbot] Google DeepMind CEO Demis Hassabis told reporters this week that Gemini has been multimodal from the beginning.

At any given time, we process a stream of different sensory information, make sense of it, and make decisions. Imagine an agent being able to see and hear what we do, better understand the situation, respond faster to conversations, and make the pace and quality of interactions feel more natural. .

Even though AI companies are rushing to develop consumer bots to help them with their daily tasks, it may be a while before they become commonplace.

AI content creation is still in its infancy and is sometimes prone to errors, hallucinations, or fabrication of false information. This can be a big problem if your assistant is completing work-related tasks where accuracy is more important than creativity.

Scaling up is also a big challenge, says Suleiman. It's a very competitive market…distribution and brand are important. Apple and Google…they have a big advantage in that sense.

Mustafa Suleiman, CEO of Microsoft AI, says scaling up AI assistants is a big challenge in a highly competitive market.David Paul Morris/Bloomberg

Suleiman joined Microsoft in March after his startup Inflection pivoted from a consumer to an enterprise model. [Pi] was a deeply involved product, but it's very difficult to reach a major scale like Gemini.

But Brett Taylor, chairman of the OpenAIs board and CEO of new AI agent startup Sierra, says replacing existing consumer interfaces presents an opportunity for a variety of companies.

The reason startups can stand out and succeed amid big technology changes is because there aren't necessarily market leaders right now, he says.

Big tech companies and their partners may be best positioned to make the most of this moment, but Yann LeCun, Chief AI Scientist at Metas, is looking to expand AI assistants beyond the Western world. states that the model needs to be open.

In the new future, every interaction with the digital world will be done through some kind of AI assistant. We will be constantly talking to these AI assistants. He said at MetaEvent in London last month that our entire digital diet will be mediated by AI systems. This is not possible for companies on the West Coast of the United States. They need variety.

Additional reporting by Michael Acton and George Hammond in San Francisco

