Nvidia unveiled the HGX H200 Tensor Core GPU on Monday, featuring the Hopper architecture designed to accelerate AI applications. This release builds upon the success of the H100 GPU, Nvidia’s previous flagship AI GPU chip introduced last year. The H200 has the potential to significantly enhance AI model capabilities and reduce response times, benefiting applications like ChatGPT.
AI experts have identified a shortage of computing power, commonly referred to as “compute,” as a major obstacle to AI advancements over the past year. This shortage has impeded the deployment of existing AI models and slowed the development of new ones, primarily due to a scarcity of high-performance GPUs optimized for AI tasks. To address this compute bottleneck, two strategies are employed: producing more chips and making AI chips more powerful. The latter approach makes the H200 an attractive option for cloud service providers.
Despite the “G” in its GPU name, data center GPUs like the H200 are not intended for graphics processing. GPUs excel in AI applications due to their ability to perform extensive parallel matrix multiplications, a crucial operation for neural network functionality. They play a vital role in both training AI models and conducting inference, where inputs are processed to produce results.
Ian Buck, vice president of hyperscale and HPC at Nvidia, emphasized the importance of efficiently processing vast amounts of data at high speeds for generative AI and high-performance computing applications. He highlighted the H200’s role in accelerating AI solutions to address critical global challenges.
For instance, OpenAI has expressed concerns about GPU resource shortages affecting the performance of ChatGPT, leading to service slowdowns and rate limiting. Implementing the H200 could potentially alleviate resource constraints, enabling AI language models to serve more users effectively.
Nvidia touts the H200 as the first GPU featuring HBM3e memory, providing 141GB of memory and a bandwidth of 4.8 terabytes per second. This marks a significant improvement, with Nvidia claiming a 2.4-fold increase in memory bandwidth compared to the 2020 Nvidia A100, which remains in high demand due to GPU shortages.
Nvidia will offer the H200 in various form factors, including server boards compatible with HGX H100 systems, as well as the Nvidia GH200 Grace Hopper Superchip, which combines CPU and GPU capabilities for enhanced AI performance.
Leading cloud service providers such as Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be among the first to deploy H200-based instances in the coming year. Nvidia plans to make the H200 available through global system manufacturers and cloud service providers starting in the second quarter of 2024.
Simultaneously, Nvidia has been navigating export restrictions imposed by the US government, limiting sales of its powerful GPUs to China. The company has responded by introducing scaled-back AI chips designed specifically for the Chinese market, including the HGX H20, L20 PCIe, and L2 PCIe. These moves come as a response to US restrictions aimed at preventing advanced technologies from reaching certain countries, including China and Russia. Expect ongoing developments and regulatory adjustments between Nvidia and US authorities in the coming months.