
Google launches new open Gemma 4 12B multimodal model for laptops with 16 GB of RAM
Google DeepMind has introduced Gemma 4 12B, a new 12 billion parameter open AI model designed to run multimodal tasks directly on standard laptops. It processes text, images, and audio together without separate encoders, reducing processing time, memory use, and latency. The model can run locally on devices with 16 GB of system RAM or VRAM, making it practical for many consumer and enterprise laptops.
According to Google, Gemma 4 12B has about half the memory footprint of Gemma 4 26B while matching much of its benchmark performance. It is also the first mid-sized Gemma model with native audio processing, supporting speech recognition, code generation, image understanding, and video analysis. In one test, it analyzed a five-minute keynote by processing 313 frames alongside the audio.
Gemma 4 12B also includes Multi-Token Prediction drafters by default, improving generation speed and efficiency. Google says the model supports complex multistep reasoning and agentic workflows that previously required larger Gemma models. It is available through Hugging Face, Kaggle, Ollama, Google AI Edge Gallery and LM Studio under the Apache 2.0 license, allowing commercial use.



Comments
This is actually a pretty cool one as it can take in audio too which is a bit rare