Microsoft is building datacenter superclusters that span continents
Summary
Microsoft has begun linking datacentres across large distances to create what it calls “Fairwater” superclusters — multi‑datacentre training fabrics intended to train the next generation of AI models with hundreds of trillions of parameters. The first node pair went live in October, connecting the Mount Pleasant, Wisconsin campus to a facility in Atlanta, Georgia. Fairwater sites are two‑storey, direct‑to‑chip liquid‑cooled facilities designed to use almost no water and to host diverse GPU fleets tuned to different workloads.
Microsoft plans to scale these clusters to hundreds of thousands of GPUs, citing deployments such as Nvidia GB200 NVL72 racks at Atlanta (high power density, large HBM3e memory). To stitch sites together it will rely on very high‑bandwidth, low‑latency networking — options include gear like Cisco’s 8223, Broadcom’s Jericho4 or Nvidia’s Spectrum‑XGS — while continuing to favour InfiniBand for its HPC environments. Researchers are addressing bandwidth and latency challenges via model compression and smart communication scheduling, as shown in work from DeepMind and others.
Key Points
- Microsoft’s “Fairwater” initiative links multiple datacentres to form continent‑spanning superclusters for massive AI training.
- The pilot connection links Mount Pleasant, Wisconsin, with Atlanta, Georgia, and is the first live multi‑datacentre AI training fabric for Microsoft.
- Fairwater facilities use direct‑to‑chip liquid cooling, two‑storey designs and aim to consume “almost zero water.”
- Microsoft plans to deploy large counts of heterogeneous GPUs (including Nvidia GB200 NVL72 racks) to match workloads and availability.
- High‑capacity networking (e.g. Cisco 8223, Broadcom Jericho4, Nvidia Spectrum‑XGS or InfiniBand) is critical to bridge ~1,000 km distances without crippling performance.
- Distributing training across datacentres raises bandwidth and latency challenges; research into compression and scheduled communications (eg DeepMind) helps mitigate these.
- Microsoft has been contacted for details on the specific networking tech in use; its ties to Nvidia make Spectrum‑XGS a plausible candidate.
Context and Relevance
This move reflects a broader industry shift: single datacentres won’t be enough for the largest forthcoming models. Hyperscalers are experimenting with multi‑site fabrics to trade off geography, power availability, cooling and cost while maintaining the compute scale needed for next‑generation AI. For anyone tracking AI infrastructure, cloud strategy or chip/network hardware, Fairwater signals how training supply chains and site selection are evolving — and how networking vendors are becoming central to AI economics.
Why should I read this?
Short and blunt: if you care where the next giant AI models get built, this matters. Microsoft is effectively saying “we’ll sew datacentres together” so model builders can pick cheap land, plenty of power and better cooling without being bottlenecked by a single site. Saves you sifting through spec sheets — this story tells you the big picture fast.
