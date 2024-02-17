



OpenAI's Sora text-to-video AI model may be spreading like wildfire across the internet, but OpenAI isn't the only company to announce major developments in AI. Days after rebranding its AI chatbot as Google Gemini, Alphabet Inc. has made a big announcement. Google has announced its latest next-generation AI model, Gemini 1.5 Pro. The new model is built on his MoE architecture and is claimed to be much more advanced than its contemporaries.

When it comes to Gemini 1.5 Pro, Google seems to have announced a model that is better and significantly ahead of its predecessors. Gemini 1.5 Pro is the first model in the Gemini 1.5 line that the company is releasing for initial testing. 1.5 Pro is a mid-sized multimodal model optimized to scale across a wide range of tasks. Here we will try to understand the new features of Gemini 1.5 Pro.

What is Gemini 1.5 Pro?

What sets Gemini 1.5 Pro apart is its long context understanding across modalities. Google claims that Gemini 1.5 Pro can achieve similar results to the recently launched Gemini 1.0 Ultra, despite having much less computing power. And the best part about Gemini 1.5 Pro is its ability to consistently process information volumes of up to 1 million tokens. This is certainly the longest context window of any large-scale foundational model ever developed. To put it in perspective, Gemini 1.0 model has a context window of up to 32,000 tokens, GPT-4 Turbo has 1,28,000 tokens and Claude 2.1 has 2,00,000 tokens.

This model comes standard with a context window of 1,28,000 tokens, but Google is allowing a limited number of developers and enterprise customers to try this model with a context window of up to 1 million tokens. Masu. Gemini 1.5 Pro is currently in preview mode, allowing developers to test their models using Google AI Studio and Vertex AI.

Google claims that it has been consistently testing, refining, and enhancing Gemini 1.0's features since its launch, and 1.5 Pro is the result of that effort. In terms of underlying technology, 1.5 Pro is built on the Mixture-of-Experts (MoE) architecture. The MoE architecture can be understood as a collective approach where the entire problem is divided into a number of subtasks and later trained by a cluster of experts on each subtask. In essence, the MoE model covers different input data from different learners or experts.

This is a step change in our approach based on fundamental model development and research and engineering innovation across nearly every part of our infrastructure. Google claims that the new MoE architecture will improve training and service efficiency for Gemini 1.5 Pro.

What are some use cases for Gemini 1.5 Pro?

Gemini 1.5 Pro is reportedly able to capture up to 7,00,000 words or around 30,000 lines of code. This is 35 times more than the Gemini 1.0 Pro can capture. Additionally, Gemini 1.5 Pro can process up to 11 hours of audio and 1 hour of video in a wide range of languages. A demo video posted on Google's official YouTube channel used a 402-page PDF to demonstrate a lengthy contextual understanding of the model. The demo also showed live interaction with a model based on a PDF file as a prompt. This was 3,26,658 tokens and contained images worth 256 tokens. A total of 3,27,309 tokens were used in the demo.

Another demo showed the Gemini 1.5 Pro using a 44-minute video, a recording of the silent film “Sherlock Junior,” along with a number of multimodal prompts. The total tokens for the video were 6,96,161 and the images were 256 tokens. In the demo, the user is seen asking the model to display specific moments in the video and related information. The model responds with timestamps and details of the moments shown in the video.

Meanwhile, another demo showed how the model interacts with 100,633 lines of code using a series of multimodal prompts.

How much does it cost and when will it be released?

Google reportedly said in the preview that Gemini 1.5 Pro with a 1 million token context window is free to use. Google may introduce price tiers in the model in the future, starting at 1,28,000 context windows and scaling up to 1 million tokens.

Gemini 1.5 Pro is a new frontier in AI development at Google. Last December, Google announced its most flexible AI model, Gemini 1.0, in three different sizes, including Gemini Ultra, Gemini Pro, and Gemini Nano. At the time of launch, Google claimed that Gemini 1.0 outperformed several state-of-the-arts on a variety of benchmarks including coding and text. The Gemini series is known for next-generation features and sophisticated reasoning. All Gemini sizes are known for their multimodality capabilities that understand text, images, audio, and more.

