With the focus on broad adoption of AI/ML technology comes the need for more energy-efficient electrical and optical interfaces. In advance of the OFC Conference, we spoke to OIF 's Nathan Tracy (TE Connectivity) and Jeff Hutchins (RANOVUS Inc.) about the consortium's work exploring the next generation of energy efficient interfaces (EEI) for AI and ML applications.
Optical networks have changed the way the world lives, works, communicates and socialises. You only have to look at the events of the last few years to realise just how much connectivity has affected people’s lives. From an environmental point of view, it has negated the need for an everyday commute, or extensive travelling to meetings, but the irony is that the networks behind the reliable connectivity are often themselves, power hungry beasts.
Recent figures from the World Broadband Association put the telecom industry’s carbon footprint alone at roughly 2% of global emissions, as the accelerated need for reliable connectivity post-pandemic has led to greater energy consumption, and therefore an expanded carbon footprint.
Then there is the datacoms market - with more than 8,000 data centre locations in the world, the market is still rapidly growing. According to the latest research from Synergy Research Group, the average capacity of hyperscale data centres to be opened over the next six years will likely be more than double that of current operational hyperscale data centres. During the same period, the total capacity of all operational hyperscale data centres is forecast to grow almost threefold.
The emergence of AI and ML in optical networks
These hyperscale data centres are increasingly integrating artificial intelligence (AI) and machine learning (ML) technologies into their compute structures. This is accomplished by implementing GPU (graphical processing unit) accelerators in compute clusters as the workhorse behind the data centre’s servers.
AI Training Cluster Architecture
Unfortunately, the increasing need for GPU accelerator interconnectedness is growing faster than Moore’s Law!
Compute FLOPs Outpacing Moore’s Law
These powerful new technologies have become critical competitive weapons as cloud operators strive for an upper hand in delivering higher value services to their clients and consumers. Nathan Tracy / TE Connectivity and OIF president claims: “We are only at the leading edge of seeing the massive changes in deployment of these technologies, but these technologies come at a significant cost in terms of the power that next generation data centres will consume and the resultant thermal power dissipation challenges. Something has to change to reduce power consumption trajectories.”
Enter OIF, where the optical networking industry’s interoperability work gets done. Founded in 1998, the not-for-profit consortium has, for the past 25 years, helped to accelerate transformation in optical networks by driving the all-important interoperability that is crucial for efficient and reliable networks.
The forum has been working hard since May on a project dedicated to studying new energy-efficient electrical and optical interfaces such as those referenced above, and ultimately identifying opportunities for interoperability standards. Jeff Hutchins / Ranovus and OIF Physical and Link Layer (PLL) Working Group (WG) Energy Efficient Interfaces (EEI) Vice Chair says: “One objective is to deliver education about this significant change in the industry and, what will be required to deliver that change?”
To help close this growing connectivity gap, the OIF initiated the EEI Framework project last May to study the problem and look for areas where interoperable solutions can help. The compute cluster links can be organised into a few different categories depending on their link type. All are low latency links but their physical layer and protocol layers will differ depending on what is being interconnected.
These low latency links can interconnect accelerators to I/O, or to pools of disaggregated memory. They can also interconnect local clusters of accelerators as well as remote clusters. Contributors to latency include any Forward Error Correction (FEC) that is used, digital signal processing in the optical transceiver, as well the time of flight of photons in the optical fibre. Even though the signal travels at the speed of light in the optical fibre, each metre of fibre adds 5 nanoseconds of latency or 10 nanoseconds for the round trip.
It is easy to see the importance of low latency links especially in relation to memory disaggregation. The time taken to respond to a memory request is limited by the overall latency. During that interval, that portion of the computation is suspended. AI compute is leading the drive for energy efficient, low latency links, and the hyperscalers are looking to better understand how to do that. The key objectives of the hyperscalers surround reduced power consumption, improved density, reduced latency, and ensuring link accountability. It's important that the industry addresses these requirements as Moore's law places increasing pressure on networking capabilities.
Interface solutions for reducing power consumption
The industry has begun to develop low-power pluggables for 100G Ethernet (otherwise known as Linear Pluggable Optics (LPO)) which eliminates the digital signal processor (DSP) thereby significantly reducing the power consumption of the optical links. OIF is one of the organisations working on this challenge. This is a great first step but can adversely impact link accountability and limit its adoption.
The OIF is studying approaches to make robust low-power, low latency solutions that are attractive in terms of complexity, cost, integration with existing infrastructure and offers reliable interoperability at the next data rates such as 224G Ethernet and 128G PCIe7.
The OIF Energy Efficient Interfaces (EEI) Framework
As part of this, the Energy Efficient Interfaces (EEI) Framework, marked by the commencement of the PLL EEI Framework Project and the subsequent EEI Physical Layer User Group (PLUG) System Vendor Requirements Project, underscores OIF’s focus on addressing the gap in interconnectivity for both electrical and optical interfaces.
As part of its work, OIF has been looking at energy-efficient links with less than fully retimed interfaces, which could offer benefits such as improved density for co-packaging applications, reduced latency and that all-important power reduction. There are numerous applications for these non-retimed links, including front panel pluggables, near packaged modules, co-packaged engines, and die-to-die electrical links.
There are also many different alternatives to legacy retimed links, all in various stages of development and trials, such as partially-retimed, and non-retimed for linearly amplified drive, direct drive, host Tx predistortion and engine Tx predistortion. The type of link and the application need to be carefully considered, alongside the need for collaboration with other standards organisations such as PCI-SIG, CXL, UEC, and the IEEE. “OIF is in ‘the centre of the storm’ in terms of our on-going industry work on next-generation electrical interface standardisation, co-packaging architecture industry firsts, and Common Management Interface Specification (CMIS) advancements. Combined with OIF’s diverse membership base, representing the full ecosystem of electrical and optical technology and component developers, test and measurement industry leaders, end use equipment innovators and hyperscale/network operators, we have the ability to consider all these technical options and weigh the tradeoff impacts,” says Tracy.
Continues Hutchins: “There has been significant interest in these links to reduce power consumption, improve density for co-packaging applications and reduce latency. There are multiple different applications for non-retimed links and different configurations, so it’s really important to clearly define these configurations to create standards. The compliance methodology needs to be defined in order to enable interoperability of the various components.”
There are a lot of factors to consider before it comes to implementing an energy efficient interface into a network, but by not doing so, network owners and operators will arguably have an even greater challenge as they could find themselves falling behind when it comes to emerging technologies. To really optimise applications such as AI and ML, a greater understanding is required. OIF is undertaking this work on behalf of the optical communications community to ensure that the ultimate solutions are sustainable and scalable. Says Hutchins: “If we think about implementing an AI fabric into today's network, if we do nothing different and we just use today's methodologies it's just totally impractical from a power consumption and density standpoint. But if we make these changes, if we understand this challenge, then we can bring in these new architectures, which will benefit everyone.”
Find out more
You can find more detailed information about the results of OIF’s Energy Efficient Interfaces (EEI) Framework by joining the OIF at upcoming panel discussions including DesignCon on January 31 in Santa Clara CA and OFC on March 28th in San Diego CA. Or become an OIF member and participate in the EEI Framework development work during OIF’s next meeting, the week of January 15th in Jacksonville FL.