Berkeley boffins build better load balancing algo with AI

Berkeley boffins build better load balancing algo with AI

Summary

Researchers at UC Berkeley used OpenEvolve, an open-source implementation of DeepMind’s AlphaEvolve, to evolve and optimise a load-balancing algorithm for large language model (LLM) inference. The AI-driven process replaced Python loops with vectorised tensor operations and introduced a zig-zag partitioning scheme, producing a version that runs in about 3.7 ms. That represents a 5x speedup over an undisclosed reference implementation and a 146x improvement over an earlier open-source Python implementation.

The team ran OpenEvolve using a mix of Gemini 2.5 Flash and Flash Lite models, spending under $10 and about five hours of compute to reach the result. They also report a separate case where ADRS (AI-Driven Research for Systems) tripled performance in relational analytics where SQL invokes LLM inference per row.

The paper frames these results as evidence that AI can discover and refine algorithms faster than humans in some systems-performance domains, while noting challenges remain around verification and evaluation for broader problem classes like security and fault tolerance.

Key Points

  • UC Berkeley used OpenEvolve to automatically generate and optimise a load balancer for LLM expert routing.
  • The evolved algorithm cuts runtime to ~3.7 ms by using vectorised tensor ops and a zig-zag partition strategy.
  • Results: ~5x faster than an unnamed reference and ~146x faster than a prior Python implementation (DeepSeek).
  • The optimisation run cost under $10 and took roughly five hours using Gemini 2.5 Flash models.
  • ADRS also produced a 3x speedup in a separate relational-analytics use case involving row-wise LLM calls.
  • Authors argue ADRS will shift human researchers toward problem formulation and validation, with verification frameworks the main bottleneck for broader adoption.

Why should I read this?

Quick and dirty: AI just rewired a load balancer to run way faster than humans did — and it did it cheap and fast. If you care about squeezing latency or cost out of LLM inference or systems at scale, this is the kind of trick you’ll want in your toolkit.

Author style

Punchy: this isn’t just an academic parlor trick — the results are concrete, measurable and immediately relevant to anyone running LLMs or high-performance systems. Read the details if you want to understand how ADRS could change optimisation workflows.

Context and Relevance

This work sits within a growing trend of using ML agents to discover and improve algorithms previously crafted by humans. It follows similar industry work (for example DeepMind’s AlphaEvolve) and demonstrates that automated search can produce substantial performance wins in verifiable domains such as systems performance and database optimisations. For engineers and researchers operating LLM inference pipelines or large-scale systems, ADRS techniques could soon become part of standard performance-tuning practices — provided robust evaluation and validation tools are established.

Source

Source: https://go.theregister.com/feed/www.theregister.com/2025/10/25/openevolve_ai_better_algorithms/