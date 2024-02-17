



Google on Thursday surprised the AI ​​world by releasing Gemini Pro 1.5, a new version of its recently released next-generation AI model Gemini Pro.

It's not necessarily groundbreaking for a company the size of Google to announce a new update, but what's important about Gemini Pro 1.5 is how much it's improved and different compared to version 1. .

Gemini Pro 1.5 has a significantly larger context window than other models on the market, for example up to 10 million tokens compared to GPT-4's 128,000 tokens. It is also technically more powerful than Gemini Pro 1.0 and Gemini Advanced 1.0, which powers the paid version of Gemini chatbot.

Google claims the new model is also more reliable, can extract specific moments, and works natively across video, audio, images, and text. This is a big problem as AI moves into the real world through his AR interfaces such as Meta Quest, Apple Vision Pro, and RayBan smart sunglasses.

What's so different about Gemini Pro 1.5?

(Image source: Google)

The token context length in Gemini Pro 1.5 is a whopping 10 million. This is the amount of content that can be stored in memory for a single chat or response.

This is enough to display hours of video or multiple books within a single conversation, and Google says you can find any information within that window with a high level of accuracy. is.

Jeff Dean, lead researcher at Google DeepMind, wrote about the X that the model also has advanced multimodal capabilities across code, text, images, audio, and video.

This means, he writes, that “entire books, very long document collections, hundreds of thousands of lines of codebases spread across hundreds of files, entire movies, entire podcast series, etc. can be manipulated in sophisticated ways.” I am.

A “needle in a haystack” test that finds the needle in the vast amount of data stored in the Context Window was able to find specific information with 99.7% accuracy, even with 10 million tokens of data. .

What are some use cases for such large context windows?

While promoting the new model, Google showed off its video analytics capabilities with a 45-minute silent film, “Sherlock Jr.,” directed by Buster Keaton.

When scanning one frame per second, the input required a total of 648,000 tokens. Gemini was then able to answer questions about the movie, such as “Tell me about the piece of paper that was taken out of the character's pocket,” and give the exact time code.

Gemini could tell exactly what was written on the paper and the moment when the paper appeared in full on the screen.

In another example, developers were able to create a quick sketch of a scene using stick figure art that wasn't particularly well drawn. I gave it to Gemini and asked her to give me a timestamp for that particular scene. Gemini returned an accurate timestamp, accurate to the second.

Other amazing benefits of big context

(Image source: Google)

Another aspect that has not received much coverage is the possibility of storing, learning, and even creating new languages. Jim Fan, a senior research scientist and AI agent expert at Nvidia, pointed to Gemini Pro 1.5's zero-shot ability to understand linguistics in surprising ways.

About X he writes: “v1.5 follows a complete language manual at inference time and learns English to Karaman translations based purely on context.”

Karaman is a language spoken by about 200 people in New Guinea, and Gemini had no information about the language during his initial training, so he had nothing to draw on.

For the test, we were given 500 pages of language documentation, a dictionary, and approximately 400 parallel sentences and the contexts in which they were applied. You can now use it to learn languages ​​and provide translations of any words and phrases from English to Karaman.

When will Gemini Pro 1.5 be available?

Gemini 1.5 Pro – High-performance multimodal model with 10M token context length Today we are releasing the first demonstration of the functionality of the Gemini 1.5 series with the Gemini 1.5 Pro model.One of the key differentiators of this model is its incredibly long… pic.twitter.com/2KLro4VwLTFebruary 15, 2024

see next

Gemini Pro 1.5 is already available to some enterprise customers using Vertex AI or Google Cloud's Generative AI studio. At some point, it will be rolled out to the Gemini chatbot, where the max context could be close to 128,000, similar to ChatGPT Plus.

This is a game-changing moment for the AI ​​sector, in a day where OpenAI launched a video model and Meta discovered how to use video to teach AI about the real world.

What we are seeing in these advanced multimodal models is an interplay between the digital and the real, where AI is gaining a deeper understanding of humanity and how we see the world.

