AWS Stargate-smashing Rainier AI megacluster is up and running

Summary

AWS has declared Project Rainier — its UltraCluster for AI — fully operational, with nearly half a million Trainium2 chips now running workloads across multiple datacentres. Anthropic is already using the cluster and AWS says it will scale to more than one million Trainium2 chips for training and inference by the end of the year. The deployment reportedly went from announcement to live in under a year. AWS highlights its end-to-end control of chip, software and datacentre design as a key advantage, while at least one Rainier site in Indiana will ultimately span dozens of very large buildings. AWS has not disclosed full geographic scope or exact capacity, and the company still faces recent reliability issues in some regions.

Key Points

Nearly 500,000 Trainium2 chips are active in Project Rainier across multiple AWS datacentres.
Anthropic is running workloads on Rainier; AWS expects to exceed 1,000,000 Trainium2 chips by year end for training and inference.
AWS says Rainier is one of the world’s largest AI compute clusters and was deployed in record time (under a year).
One Rainier site in Indiana is already partially online and is planned to include around 30 datacentre buildings of ~200,000 sq ft each.
The build highlights AWS’s vertical integration (custom silicon, software and datacentre design), but recent AWS outages point to operational reliability risks.

Context and Relevance

This announcement ramps up the AI infrastructure race with OpenAI’s Stargate initiative and other hyperscalers. Stargate-backed projects have been bringing significant megawatts online (about 200MW noted in Abilene, Texas, with plans for much more), and partners like Oracle have large capacity commitments. Rainier emphasises AWS’s strategy of bespoke hardware plus rapid rollout — a combination that can deliver scale and cost advantages for big AI customers. For enterprises, cloud providers and hardware vendors the development affects procurement, model placement, latency and competitive positioning; for regulators and governments it raises questions about concentration of AI compute and resilience given recent cloud outages.

Why should I read this?

Short and blunt: if you care who will power tomorrow’s big models, this is essential. AWS just lit up an enormous, custom-built AI cluster fast — and it’s already running Anthropic. We’ve pulled the key facts so you don’t have to skim press releases and blogs.

Source

Source: https://go.theregister.com/feed/www.theregister.com/2025/10/29/aws_rainier_ai_megacluster/