



Last week, OpenAI released GPT-4o, the latest version of its flagship large-scale language model (LLM).

LLM stands out as multimodality with the ability to reason in real-time across audio, vision, and text.

For years, building AI that understands multiple modalities has proven difficult. Even creating pipelines for tasks such as speech-to-text conversion is difficult due to issues such as long processing times.

Now, with GPT-4o, you can do that almost instantly.

But to this point, AI platform providers have invested heavily in such multimodal projects, and Redis CEO Rowan Trollope has suggested that such funding is outdated. Masu.

The former Five9 CEO told X (ex-Twitter):

Hundreds of millions of dollars of research and development into contact center AI agents have been obsolete with the development of OpenAI GPT-4o.You need to focus on backend automation

This isn't the first time that ChatGPT has made contact center innovation research and development obsolete.

Think about how previous releases of LLM undermined the work of many analytics providers who spent hundreds of hours engineering natural language processing (NLP) models to measure things like intent and sentiment. Please try it. LLM can take all of this out at your fingertips.

Another example is agent-assisted innovation. Whereas companies used to spend significant R&D resources building use cases such as isolating key data points within a customer conversation, ChatGPT and other LLMs can do it instantly. I can.

In fact, this is ultimately why many companies choose to implement LLM in their contact centers in the first place. The use case already existed, but now it's much more accessible.

Consider the following image from the October 2023 Gartner study. This shows that customer service is a major recipient of companies' GenAI investments.

This is expected to continue and with the introduction of multimodal LLM, use cases will become more innovative. Real-time translation is a great example.

Real-time translation and other multimodal use cases

Conversational AI vendors have brought a variety of real-time translation models to market in recent years, with brands like Cognigy making them available for voice channels as well.

These apps typically first use speech-to-text to create a transcript from the customer's audio.

That transcript is sent through a translation engine, such as Google Translate, and the agent receives the text translation within their workspace.

From there, the agent types a response, which is translated into the original language through the engine and played through a text-to-speech audio stream.

The main problem with these experiences is the dead air between the customer speaking and the agent typing the response. It's a rapport killer.

Thankfully, with off-the-shelf real-time translation, multimodal LLM can potentially solve the problem.

Take the example of GPT-4o released by OpenAI. Translating live conversations between native speakers of English and Spanish.

Alongside translation, consider how AI can adjust the agent's accent to be more familiar to the customer and ensure complete understanding.

Krisp already provides this use case. However, GPT-4.0 has the potential to become even more widely available. As it turns out, one of OpenAI's demos showed a style that leverages GPT to change voices on the fly.

As a final example, consider how GPT-4o can transform customer and virtual agent conversations.

For example, consider that many leading conversational AI vendors are powering their solutions with image recognition (IR) to recognize entities in photos and make automated recommendations. Multimodal LLM provides this functionality out of the box.

Adding that functionality to virtual agents has the potential to enable many virtual agent use cases across a variety of sectors including retail, utilities, and the public sector.

Take local councils as an example of the latter. When someone tweets a photo of defective street furniture and shares its location, GPT-4o can identify whether it belongs to the city council.

This validation allows LLM to trigger automated, personalized responses and prompt pre-planned workflows to resolve such issues.

As Trolllope suggested, ensuring proper enterprise orchestration of such use cases could be the next battleground, especially given the sophistication of these out-of-the-box capabilities. there is.

GPT-4o: The broader company story

The release of GPT-4o provides further insight into the future of customer interactions.

For example, the following demo of two GPT-4os interacting and singing could shed light on a future where machine customers and agents converse on behalf of their human counterparts.

However, it's also interesting to think about how OpenAI has chosen to present all of these demos via a smartphone, making GPT-4o's flicking between modalities almost an extension of the senses.

This suggests the company is entering the mobile market to further expand the adoption of generative AI, although recent reports indicate that OpenAI is in talks with Apple over deeper integration of its technology into iOS. This is perhaps not surprising, given the reports.

Additionally, the smartphone example highlights the impact of multimodal LLM on consumers' daily workflows.

But perhaps most pertinently, OpenAI brings us closer to real-time, affordable AI in the enterprise by opening up multimodal capabilities to all users.

For example, people in the financial industry can use this model in workflows that compare documents, find errors, and send emails.

Previously, these steps required step-by-step documentation, scripting, and slow, inflexible process flows. Now, AI can dynamically adapt and automate these workflows, significantly increasing efficiency.

Also, consider a large consumer packaged goods (CPG) organization that uses planograms to manage product placement.

Traditionally, auditing these placements required photographing and manual analysis. GPT-4.o allows the company to analyze video footage in real time, overcoming previous limitations such as poor lighting and space constraints.

These are just two of many examples highlighting how GPT-4.o can automate complex workflows and power real-time interactions across the enterprise.

However, as Trolllope surmised, the success of that integration will depend on customizing the model to specific needs and ensuring accurate, context-aware responses, as well as developing backend integrations. .

