


Google Claims Gemma 3 Reaches 98% of DeepSeek's Accuracy Using Only One GPU 42
Google says its new open-source AI model, Gemma 3, achieves nearly the same performance as DeepSeek AI's R1 while using just one Nvidia H100 GPU, compared to an estimated 32 for R1. ZDNet reports: Using "Elo" scores, a common measurement system used to rank chess and athletes, Google claims Gemma 3 comes within 98% of the score of DeepSeek's R1, 1338 versus 1363 for R1. That means R1 is superior to Gemma 3. However, based on Google's estimate, the search giant claims that it would take 32 of Nvidia's mainstream "H100" GPU chips to achieve R1's score, whereas Gemma 3 uses only one H100 GPU.
Google's balance of compute and Elo score is a "sweet spot," the company claims. In a blog post, Google bills the new program as "the most capable model you can run on a single GPU or TPU," referring to the company's custom AI chip, the "tensor processing unit." "Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena's leaderboard," the blog post relates, referring to the Elo scores. "This helps you to create engaging user experiences that can fit on a single GPU or TPU host."
Google's model also tops Meta's Llama 3's Elo score, which it estimates would require 16 GPUs. (Note that the numbers of H100 chips used by the competition are Google's estimate; DeepSeek AI has only disclosed an example of using 1,814 of Nvidia's less-powerful H800 GPUs to server answers with R1.) More detailed information is provided in a developer blog post on HuggingFace, where the Gemma 3 repository is offered.
Google's balance of compute and Elo score is a "sweet spot," the company claims. In a blog post, Google bills the new program as "the most capable model you can run on a single GPU or TPU," referring to the company's custom AI chip, the "tensor processing unit." "Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena's leaderboard," the blog post relates, referring to the Elo scores. "This helps you to create engaging user experiences that can fit on a single GPU or TPU host."
Google's model also tops Meta's Llama 3's Elo score, which it estimates would require 16 GPUs. (Note that the numbers of H100 chips used by the competition are Google's estimate; DeepSeek AI has only disclosed an example of using 1,814 of Nvidia's less-powerful H800 GPUs to server answers with R1.) More detailed information is provided in a developer blog post on HuggingFace, where the Gemma 3 repository is offered.
I was getting all excited (Score:1)
I was getting all excited when I thought the article was talking about Gemma Chan⦠turns out it's just another generic AI bot.
Re: (Score:2)
I guess nobody here has ever seen “Humans”
Lol, Google has AI (Score:1)
Please clap.
I think.. impressive (Score:2)
Re: (Score:2)
Re: (Score:2)
Still fantastic, being it has the inference speed around that of a 220W NVidia GPU, and 16x the VRAM.
Re: (Score:2)
Excellent machines for sure. I've just been playing with Gemma 3 27b's vision functions and
Re: (Score:2)
I suppose, also keep in mind that measurement is the full system power. Full brightness display is +10W, any TB peripherals being fed by your laptop can also be expensive (up to 15W each port, I think), so my baseline may be lower than others, being I'm doing this with my brightness 1 step from off, and nothing plugged in except the MagSafe.
And ya, I took a p
Re: (Score:2)
Enhance 224 to 176. Enhance, stop. Move in, stop. Pull out, track right, stop. Center in, pull back. Stop. Track 45 right. Stop. Center and stop. Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to 46. Pull back. Wait a minute, go right, stop. Enhance 57 to 19. Track 45 left. Stop. Enhance 15 to 23. Give me a hard copy right there.
My binned M4 Pro/48GB gets hot enough as it is, your Max must be roasting!
Re: (Score:2)
My binned M4 Pro/48GB gets hot enough as it is, your Max must be roasting!
It wants to. I find the system fan curve will allow it get hot enough that it starts pulling back the GPU clocks.
I'm using Temp Monitor [vimistudios.com] to set a "boost" mode, where if it detects the average GPU core temp hit 60C or above, it cranks the fans to 100%.
Can't proactively set it to full fans, because the Mac refuses any fan commands until it itself turns on its fans (since they're off when temps are reasonable).
This is an annoying change from my M1 Max MBP, which let me set the fan to whatever I wanted whene
Re: (Score:2)
Probably thermal throttling if the power drops off.
I imagine a lot of AI companies are re-evaluating their expected power consumption needs as it gets more efficient. Not good news for nuke fans. It's also nice to see Google managing to compete with a Chinese firm for a change.
That's amazing! (Score:2)
Does this mean AI only needs a single brain cell??
Re:That's amazing! [The third attempt is better!] (Score:2)
I think you're going for funny and on that basis it deserved to be FP. However I think the significance of the story is pretty close to null. LOTS of room for optimization, though the claim of second system effect is that the biggest improvement is in the second round.
personal AI (Score:2)
AI at that level these days has generally been something on the cloud that you pay fees to access. And it presumably has the entire history of your interaction with it, which is troubling. This improvement in efficiency (assuming true) makes it a lot easier for a modest-size corporation to contemplate owning the physical AI. It will result in faster proliferation of these machines. Let's hope we survive it.
Re: (Score:2)
Re: (Score:2)
There are several models that run well on machines with 128GB of VRAM, which is a budding but existing market.
Re: (Score:2)
Today, you can do that on a Mac, and soon you'll be able to do it with a Strix Halo. Probably not long from now, an Intel (assuming they can read the room)
This means not just a corporation- a person can do it.
Re: (Score:2)
It comes within 98% of their score? (Score:2)
That's really not much of a claim, is it?
Come back when it's running on my iPhone6 (Score:2)
Who has a H100 lying around?
Re: (Score:2)
Re: (Score:2)
Learned my first assembly language on that bad boy!
Re: (Score:2)
Re: (Score:2)
M2 Max (Mac Studio, MacBook Pro), M2 Ultra (Mac Studio), M3 Max (Mac Studio, MacBook Pro), M3 Ultra (Mac Studio), M4 Max, (MacBook Pro).
Soon Strix Halo will be available if AMD is your thing.
On my M4 Max, I'm getting ~9t/s at FP16 and ~16t/s at Q8_0. Nice and usable.
At lower quantizations (Q4, etc) you could run it on top-of-the-line discretes (24GB VRAM, etc)
Re: (Score:2)
Re: (Score:2)
It does not seem to be the RAM that makes AI work. It seems to be the trillions of integer calculations the chip can do in a second.
It this is correct, then this begs a question: why would this affect accuracy of the AI results at all?
Shouldn't it just be slower producing it (not less accurate), when run on a slower machine?
Re: (Score:2)
Re: (Score:2)
Imagine how you would get it to run the calculations across the billions of parameters, while maintaining gigabytes of hidden state.
Assuming you could create a mechanism that streamed the necessary memory when required (some kind of paging), then slow wouldn't begin to describe your experience, no matter the speed of the computing elements. You'd be limited by the memory bandwidth. You would get a token every decade.
Re: (Score:2)
Re: (Score:2)
The problem isn't the Pi, in particular- it's going to be limited by how quickly the Pi can feed those gigabytes of data to the AI hat.
For smaller AI models (not LLMs) that kind of thing is going to work just fine.
Vision classifiers and stuff like that.
Re: (Score:2)
Re: (Score:2)
I.e., the model in question is a 27B model (27 billion parameters)
At BF16 (its native resolution), that requires 54GB of VRAM to use the model.
27GB if quantized down to INT8.
So ya, RAM is the fundamental limiting factor for what kind of models you can run on your machine.
Re: (Score:2)
Compared to... (Score:2)
Interesting point that US AI guys are no longer just comparing their work to each other.
They're now genuinely treating Chinese AI models as a benchmark. A massive break with status quo.
Re: (Score:2)
Re: (Score:2)
What's really surprising, is this is a really small model. 27B. Runs fast as hell on my MBP.
If it really does compete with R1 on anything but a couple of very specific benchmarks, that would be pretty fucking amazing.
Not actually faster at scale (Score:2)
For large scale access to the model it's actually more expensive to run. R1 is FP8 native and if you have 32 GPUs any way to speed up requests, there is no lack of memory. Only active parameter count matters.
That's the advantage of MoE, lots of memory for trivia, but at scale just as fast as a small model.