temp_image_1775236699.622104 Google Gemma 4: The Next Generation of Open Models

Google Gemma 4: A Leap Forward in Open-Source AI

Today marks a significant milestone in the world of Artificial Intelligence. We’re thrilled to introduce Google Gemma 4 – our most intelligent open models to date. Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of intelligence-per-parameter, pushing the boundaries of what’s possible with open-source AI.

A Thriving Gemmaverse

This breakthrough builds on incredible community momentum. Since the launch of our first generation, developers have downloaded Gemma over 400 million times, fostering a vibrant Gemmaverse of more than 100,000 variants. We listened closely to the needs of innovators, and Gemma 4 is our answer: breakthrough capabilities made widely accessible under an Apache 2.0 license.

Open model performance vs size on Arena.ai’s chat arena as of 4/1.

Powered by Gemini 3 Technology

Built from the same world-class research and technology as Gemini 3, Gemma 4 is the most capable model family you can run on your hardware. These models complement our Gemini models, giving developers the industry’s most powerful combination of both open and proprietary tools.

Four Versatile Sizes

We are releasing Gemma 4 in four versatile sizes to suit a wide range of applications:

Effective 2B (E2B)
Effective 4B (E4B)
26B Mixture of Experts (MoE)
31B Dense

The entire family moves beyond simple chat to handle complex logic and agentic workflows. Our larger models deliver state-of-the-art performance for their sizes, with the 31B model currently ranking as the #3 open model in the world on the industry-standard Arena AI text leaderboard, and the 26B model securing the #6 spot. Gemma 4 consistently outperforms models 20x its size.

Efficiency and Accessibility

For developers, this new level of intelligence-per-parameter means achieving frontier-level capabilities with significantly less hardware overhead. At the edge, our E2B and E4B models redefine on-device utility, prioritizing multimodal capabilities, low-latency processing, and seamless ecosystem integration.

We’ve sized the Gemma 4 models specifically to run and fine-tune efficiently on hardware – from billions of Android devices worldwide, to laptop GPUs, all the way up to developer workstations and accelerators. By using these highly optimized models, you can fine-tune Gemma 4 to achieve state-of-the-art performance on your specific tasks.

Real-World Impact

We’ve already seen incredible success with this approach. For instance, INSAIT created a pioneering Bulgarian-first language model (BgGPT), and we worked with Yale University on Cell2Sentence-Scale to discover new pathways for cancer therapy.

Key Features of Gemma 4

Optimized for Accessibility: Unquantized bfloat16 weights fit efficiently on a single 80GB NVIDIA H100 GPU. Quantized versions run natively on consumer GPUs.
Mixture of Experts (MoE): The 26B MoE model activates only 3.8 billion parameters during inference for exceptionally fast performance.
Dense Model: The 31B Dense model maximizes raw quality and provides a powerful foundation for fine-tuning.
Edge-Optimized Models: E2B and E4B models prioritize multimodal capabilities and low-latency processing on edge devices.

Open Source and Secure

Building the future of AI requires a collaborative approach. That’s why Gemma 4 is released under a commercially permissive Apache 2.0 license, providing complete developer flexibility and digital sovereignty. These models undergo the same rigorous infrastructure security protocols as our proprietary models, ensuring a trusted and reliable foundation.

Stay updated with the latest news from Google by following us.

Google Gemma 4: The Next Generation of Open Models