Amazon Believes the Future of Data Centers Relies on a Technical Challenge It Has Recently Overcome

Throughout the years, the tech sector has innovated various adaptations of the fat-tree architecture. While it is generally dependable, there are notable inefficiencies and rigidity, coupled with the need for intricate cabling systems. We’re talking about physical cables, here.
If you’ve ever visited a data center or a server room in an office building, you’ve probably noticed a tangle of colorful cables pouring out of metal racks. According to Rehder, cabling represents one of the highest costs in networking, with Amazon currently linking its global data centers using 20 million kilometers of fiber-optic cables. This distance is akin to making the journey from Earth to the moon and back 25 times.
In 2012, amid the skyrocketing demand for cloud computing services, a team of researchers at the University of Illinois Urbana-Champaign, including Godfrey, introduced an idea called Jellyfish. Existing network designs were struggling to handle increasing demands, prompting the researchers to suggest a “high-capacity network interconnect that, utilizing a random graph topology, lends itself to gradual expansion.” They anticipated that this random strategy could surpass the efficiency and scalability of networks based on fat-tree architecture.
“We called it Jellyfish because it’s versatile,” Godfrey explains. “You can randomly connect routers and switches, creating a flexible reservoir of network capacity that is highly efficient.”
However, Jellyfish also brought forth new obstacles in terms of layout, data routing, and cabling. According to Godfrey, routing within random graphs is more complex due to the variety of paths data can take from source to destination. Likewise, cabling becomes more challenging as the endpoints are chosen randomly.
A few years later, Google began experimenting with a different approach: integrating optical circuit switching, or OCS, into its network frameworks. This method employs tiny mirrors to direct light from one port to another, enabling Google to adjust optical cabling in real-time. However, this too introduces additional engineering complexities and costs.
So Random
Meanwhile, Amazon was in pursuit of the “holy grail,” as noted by Giacomo Bernardi, one of the lead authors of a recent paper alongside Amazon Scholars Ratul Mahajan and Seshadhri Comandur. In a perfect scenario, a data network would be flat and efficient, resilient to hardware failures, diverse enough to optimize performance, and scalable without spiraling out of control. It would also depend on simpler, more streamlined cabling rather than increasingly complicated fiber-optic systems.
As he and his team set out to construct such a network, Bernardi says he had become fascinated with Penrose tiling, a type of aperiodic tiling named after British physicist Roger Penrose. (Some researchers, inspired by Penrose tilings, have attempted to adapt these patterns into error-correcting codes for quantum computers.) Bernardi pondered whether Amazon could adopt a similar approach and create a flat “mesh” by following a repetitive pattern. He and his team began building a simulation to visualize what that could look like.
