OpenAI to serve up ChatGPT on Cerebras’ AI dinner plates in $10B+ deal

OpenAI to serve up ChatGPT on Cerebras’ AI dinner plates in $10B+ deal

Summary

OpenAI will deploy Cerebras wafer-scale accelerators — totalling about 750 megawatts of kit through to 2028 — in a cloud service agreement reportedly worth more than $10 billion. Cerebras will build and lease datacentres for OpenAI, integrating its WSE-3 wafer‑scale engines (each with 44 GB SRAM and huge bandwidth) into OpenAI’s inference pipeline to dramatically speed up token generation and interactive workloads.

The chips trade off SRAM bandwidth for density: SRAM is far faster than GPU HBM (Cerebras claims ~21 PB/s), but is space-inefficient, forcing models to be spread across multiple chips. That increases power and deployment complexity (each chip is rated around 23 kW). OpenAI’s model-router approach and disaggregated inference strategies could limit costs by routing most queries to smaller models and only sending complex requests to the biggest hardware.

Key Points

  • Deal reportedly exceeds $10bn and covers Cerebras building/leasing datacentres for OpenAI through 2028.
  • Cerebras WSE-3 accelerators use large on‑chip SRAM (44 GB per chip) and claim extremely high memory bandwidth (~21 PB/s).
  • SRAM bandwidth enables much faster inference and higher single-user token rates — useful for real-time agents and extended reasoning.
  • SRAM is space-inefficient, so larger models must be parallelised across multiple chips, increasing power and complexity (chips rated ~23 kW).
  • OpenAI’s model router (introduced with GPT-5) should help by directing most requests to smaller models, reserving wafer‑scale hardware for heavy workloads.
  • Possible disaggregated setups could run prompt processing on GPUs and offload token generation to Cerebras’ accelerators — but that requires Cerebras to host GPU systems alongside its waferscale kit.

Context and relevance

This is a strategic move in the inference era. Training dominance has largely gone to big GPU vendors, but inference — where latency, token throughput and cost-per-query matter — remains contested. If OpenAI can reliably serve more interactive, longer-reasoning sessions with lower latency, it improves user experience and monetisation potential while shifting datacentre design and economics. The deal signals growing vendor diversity in hyperscale AI infrastructure and highlights trade-offs between raw bandwidth, memory capacity and deployment power costs.

Why should I read this?

Because it’s huge and weirdly simple: OpenAI’s betting serious money that faster SRAM-packed chips will make ChatGPT snappier and more capable in real time. If you care about AI latency, cost of running models, or what powers the next wave of agent-style apps, this explains where the infrastructure race is heading — and why it might not all be Nvidia GPUs.

Author style

Punchy: this isn’t a tiny vendor tie-up — it’s a multi-billion-dollar infrastructure play that could reshape inference economics and user experience. Read the detail if the nuts and bolts of AI deployment matter to you.

Source

Source: https://go.theregister.com/feed/www.theregister.com/2026/01/15/openai_cerebras_ai/