AI Burning Man happens next week – here’s what The Register expects at GTC 2026
Summary
Nvidia’s GTC 2026 is shaping up to be a major moment: Jensen Huang is expected to outline how Nvidia will combine its GPU and CUDA stack with the dataflow tech it acquired from Groq to tackle token-heavy generative AI workloads. The preview highlights efficiency gaps in current GPU architectures versus SRAM-heavy rivals such as Groq and Cerebras, and explains why Nvidia’s Rubin GPUs, Vera CPU, future Kyber racks and Feynman GPUs will dominate the conversation. Other likely talking points include OpenClaw/NemoClaw agent frameworks, robotics demos powered by Isaac and Omniverse, and the ongoing power/cooling implications of ever-larger datacentre systems.
Key Points
- Nvidia’s Groq acquisition aims to close a token-generation gap for latency-sensitive, high-throughput AI workloads.
- Rubin GPUs deliver a substantial uplift in memory bandwidth and dense FP throughput versus Blackwell; expect Huang to emphasise performance and efficiency gains.
- SRAM-heavy architectures (Groq, Cerebras) currently excel at low-latency token generation; Nvidia plans to blend dataflow with its CUDA ecosystem.
- Vera CPU is being positioned as a standalone Arm-based competitor for mainstream datacentre use, beyond its role in Vera-Rubin superchips.
- Kyber racks and Feynman GPUs are being telegraphed for 2027–2028, forcing the industry to plan for extreme power and cooling needs.
- Nvidia may hint at consumer-facing or integrated SoC moves, but large memory and thermal constraints make big consumer reveals unlikely at GTC.
- OpenClaw/agentic frameworks, robotics (Isaac), and Omniverse simulations will feature heavily despite security concerns around agent frameworks.
Content summary
The article explains why Nvidia needs Groq’s dataflow approach: mainstream GPUs struggle when many users require fast, interactive token streams. Benchmarks suggest SRAM-heavy chips can produce hundreds to thousands of tokens per second, a capability that swung business such as OpenAI to Cerebras for some workloads. Nvidia’s Rubin architecture offers a major throughput uplift and will appear in HGX and NVL72 systems; thermal design power is high, so liquid cooling will be necessary for many deployments. Huang is likely to discuss roadmaps for Vera CPUs, Kyber racks and Feynman GPUs, and to push Nvidia’s software and systems story. Outside silicon, expect demos around agent frameworks (OpenClaw/NemoClaw), robotics and Omniverse-driven digital twins.
Context and relevance
This preview matters if you follow AI infrastructure, datacentre planning or enterprise AI deployments. Nvidia sets hardware and software expectations across the industry — announcements about combining dataflow and CUDA, or shipping standalone Vera CPUs, will influence procurement, cooling and power planning, and which startups or incumbents win inference business. The balance between throughput, latency and cost-per-token is central to where workloads run (GPUs vs SRAM-based accelerators) and how vendors position their stacks.
Why should I read this?
Quick version: if you care who will actually run your chatbots and code assistants — and how much it will cost to do it at scale — this is the preview you want. We’ve skimmed the hype and pulled out the bits that affect performance, power and platform choices so you don’t have to sit through every keynote.
