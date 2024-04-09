



Posted by: Tris Warkentin – Director of Product Management and Jane Fine – Senior Product Manager

In February, we announced Gemma. This is a family of lightweight, cutting-edge open models built on the same research and technology used to create Gemini models. The community's amazing response was truly inspiring, with great tweaked variants, Kaggle notebooks, integrations into tools and services, and recipes for his RAG using databases like MongoDB.

Today, we are excited to announce our first addition to the Gemma family. This opens up more possibilities for ML developers to innovate responsibly. CodeGemma, for code completion and generation tasks and instruction following, and RecurrentGemma, an efficiency-optimized architecture. Research experiment. Additionally, we will share updates and terms of Gemma that we aim to improve based on valuable feedback from our community and partners.

Introducing the first two Gemma variants CodeGemma: code completion, generation, and chat for developers and enterprises

Leveraging the foundation of the Gemma model, CodeGemma brings powerful yet lightweight coding capabilities to the community. The CodeGemma model is available as a 7B pre-trained variant specialized for code completion and code generation tasks, a 7B instruction-tuned variant for code chat and instruction tracking, and a 2B pre-trained variant for fast code completion adapted to the local machine. available. The CodeGemma model has several advantages.

Intelligent code completion and generation: Complete lines, functions, and even generate entire blocks of code whether you're working locally or leveraging cloud resources. Improved accuracy: Trained on 500 billion tokens of primarily English data from web documents, math, and code, the CodeGemma model produces code that is not only more syntactically accurate, but also semantically meaningful. , helps reduce errors and debugging time. Multi-language proficiency: Valuable coding assistant in Python, JavaScript, Java, and other popular languages. Streamlined workflow: Integrate CodeGemma models into your development environment to reduce boilerplate and quickly focus on important, interesting, and differentiated code. This table compares the performance of CodeGemma with other similar models on both single-line and multi-line code completion tasks. Please see our technical report for more information.

Learn more about CodeGemma in our report or try it out with this quickstart guide.

RecurrentGemma: Efficient and faster inference with larger batch sizes for researchers

RecurrentGemma is a technically unique model that leverages recurrent neural networks and local attention to improve memory efficiency. While achieving benchmark score performance similar to the Gemma 2B model, RecurrentGemma's unique architecture provides the following benefits:

Reduced memory usage: Lower memory requirements allow you to generate longer samples on devices with limited memory, such as a single GPU or CPU. Higher throughput: Due to reduced memory usage, RecurrentGemma can run inference with significantly higher batch sizes, thus generating significantly more tokens per second (especially when generating long sequences). Research Innovation: RecurrentGemma introduces non-transformative models that achieve high performance and highlights advances in deep learning research. This graph shows how RecurrentGemma maintains sampling speed regardless of sequence length, while Transformer-based models like Gemma slow down as sequences get longer.

Read our paper to understand the underlying technology. To really explore, try out the notebook that shows you how to fine-tune your model.

Built on Gemma Foundation and Extended Capabilities

Based on the same principles as the original Gemma model, the new model variant offers:

Open availability: Open availability and flexible terms of use foster innovation and collaboration. High-performance, efficient functionality: Advance the power of open models with code-specific domain expertise and optimized design for blazingly fast completion and production. Responsible design: Our commitment to responsible AI ensures that our models deliver safe and reliable results. Flexibility for diverse software and hardware: Both CodeGemma and RecurrentGemma: Built on JAX and compatible with JAX, PyTorch, Hugging Face Transformers, and Gemma.cpp. Enable local experimentation and cost-effective deployment across a variety of hardware, including laptops, desktops, NVIDIA GPUs, and Google Cloud TPUs. CodeGemma: Also compatible with Keras, NVIDIA NeMo, TensorRT-LLM, Optimum-NVIDIA, MediaPipe, and available with Vertex AI. RecurrentGemma: Support for all mentioned products will be available in the coming weeks.Gemma 1.1 Update

Alongside new model variants, we are releasing Gemma 1.1 which includes performance improvements. Additionally, we listened to developer feedback, fixed bugs, and updated our terms to provide more flexibility.

Get started today

These first Gemma model variants will be available starting today on Kaggle, Hugging Face, and Vertex AI Model Garden, and in various locations around the world. Here's how to get started:

We encourage you to try out the CodeGemma and RecurrentGemma models and share your feedback on Kaggle. Together, let's shape the future of AI-powered content creation and understanding.

