Amazon announced Wednesday that its cloud division, Amazon Web Services (AWS), has developed specialized hardware to cool next-generation Nvidia graphics processing units (GPUs) used for artificial intelligence (AI) workloads.
The Challenge of Cooling Power-Hungry Nvidia GPUs
Nvidia’s GPUs have been central to the rapid growth of generative AI, but they consume massive amounts of energy. Cooling these processors efficiently is essential, especially at the scale AWS operates.
Initially, Amazon considered building data centers equipped for widespread liquid cooling. However, Dave Brown, AWS Vice President of Compute and Machine Learning Services, explained in a YouTube video that this approach would have taken too long and commercially available liquid cooling solutions were inadequate.
Brown noted that such systems would either consume excessive floor space or increase water usage substantially. While these solutions might work at smaller scale providers, they couldn’t meet AWS’s demand for liquid-cooling capacity.
To overcome these challenges, Amazon engineers designed the In-Row Heat Exchanger (IRHX), a hardware unit that can be integrated into existing and new data centers to efficiently cool high-density Nvidia GPUs.
Previous Nvidia chip generations could be cooled adequately with traditional air cooling, but the newer models require this innovative solution.
Customers can now access AWS’s new infrastructure through computing instances named P6e. These systems are optimized for Nvidia’s latest design for dense computing power, such as the GB200 NVL72 GPU, which fits 72 Nvidia Blackwell GPUs into a single rack for training and running large AI models.
AWS’s Growing Hardware Portfolio
Amazon isn’t new to developing its own hardware. The company has created custom chips for general-purpose and AI-specific computing, as well as designing proprietary storage servers and networking equipment. By using homegrown hardware, Amazon reduces reliance on third-party suppliers, potentially improving cost efficiency and performance.
AWS reported its widest operating margin since at least 2014 in the first quarter, and it remains the primary contributor to Amazon’s net income.
Microsoft, AWS’s closest competitor in cloud infrastructure, has similarly advanced chip development. In 2023, Microsoft introduced custom cooling systems called Sidekicks to support its Maia AI chips, showing that leading cloud providers are investing heavily in both hardware and cooling innovations.
What The Author Thinks
As AI workloads grow exponentially, standard cooling methods will no longer suffice. Amazon’s IRHX demonstrates that cloud giants must innovate beyond traditional infrastructure to keep pace with hardware demands. This move not only improves performance but also showcases how vertical integration—developing both hardware and cooling technology in-house—can become a strategic advantage in the competitive cloud market.
Featured image credit: Andy Hay via Flickr
For more stories like it, click the +Follow button at the top of this page to follow us.