Google’s Ironwood TPUs represent a bigger threat than Nvidia would have you believe

Summary

Google’s newest TPU v7 “Ironwood” accelerators narrow the hardware gap with Nvidia’s Blackwell GPUs by delivering near-comparable dense FP8 performance and memory specs while leaning on Google’s long-standing advantage: extreme scale. Ironwood offers 4.6 petaFLOPS FP8 compute, 192 GB HBM3e with ~7.4 TB/s bandwidth, and four ICI links for chip-to-chip comms. Where it truly stands out is pod-scale: Google ships pods from 256 up to 9,216 chips and can, in theory, stitch dozens of pods together across its Jupiter network — providing multi-hundred-thousand-accelerator domains that few rivals match in that form factor.

The platform uses a 3D torus mesh plus optical circuit switching (OCS) to reduce switch-related latency and provide flexible partitioning and fault tolerance. That approach contrasts with Nvidia and AMD rack-scale, switch-based designs and makes trade-offs that suit different workloads. Large model builders such as Anthropic are already committing massive TPU deployments, underlining the practical threat to GPU dominance — especially when software and scale are considered alongside raw chip specs.

Key Points

Ironwood (TPU v7) delivers ~4.6 petaFLOPS dense FP8, comparable to Nvidia’s B200 (4.5 PF) and close to higher-end GB-series GPUs.
Each TPU has 192 GB HBM3e and ~7.4 TB/s bandwidth, putting it in the same class as recent GPU offerings.
Interconnect: four ICI Links provide ~9.6 Tbps aggregate; less than some Blackwell chips but designed for Google’s mesh topology.
Google’s advantage is scale — pods from 256 to 9,216 chips, and networking (Jupiter + OCS) that can theoretically link many pods into massive clusters.
Topology trade-offs: Google’s 3D torus + OCS reduces switch latency and enables flexible pod slicing, while GPU vendors use flatter switch fabrics with lower hop counts for some workloads.
Major customers (eg. Anthropic) are deploying large TPU fleets, showing real-world demand beyond Google’s internal use.
Performance is now close enough that software, tooling and system-level scale will often decide which accelerator ecosystem wins for a given customer.

Context and Relevance

This matters because the AI-infrastructure race is no longer just about single-chip peak FLOPS. Google has combined competitive per-chip performance with datacentre-scale networking and switching choices that let it deliver very large, tightly coupled TPU clusters. For organisations planning model training or inference at scale, that changes the calculus: you must weigh raw GPU features against the practicality and economics of massive TPU pods, the networking topology that fits your workload, and the software ecosystem.

Why should I read this?

Short version: if you care about AI compute — whether you pick GPUs, TPUs or both — this is a wake-up call. Google’s Ironwood isn’t just another chip; it’s a reminder that scale+networking+software can beat a marginal FLOPS lead. Read it to know how procurement and architecture choices might shift next year.

Author style

Punchy: the piece makes a clear, forceful case — Ironwood closes the gap to Blackwell and leverages Google’s pod-scale strengths. If you’re responsible for AI infrastructure, it’s worth digging into the details rather than shrugging it off as vendor hype.

Source

Source: https://go.theregister.com/feed/www.theregister.com/2025/11/06/googles_ironwood_tpus_ai/