Nvidia introduces the latest Retriever, DGX Cloud, and Project Ceiba supercomputer to the AWS platform.

ByYasmeeta Oon

Nov 29, 2023

Nvidia and Amazon Web Services (AWS) are strengthening their strategic partnership with a series of major announcements at the AWS re:Invent conference.

During the event, Nvidia is unveiling a groundbreaking offering called DGX Cloud, which introduces the Grace Hopper GH200 superchip to AWS for the first time. Additionally, Nvidia and AWS are embarking on Project Ceiba, an ambitious initiative to create the world’s largest public cloud supercomputing platform, featuring Nvidia technology on AWS infrastructure, delivering an astounding 64 exaflops of AI processing power. AWS is also expanding its cloud offerings with the introduction of four new GPU-powered cloud instances on the EC2 service.

In a bid to assist organizations in developing superior large language models (LLMs), Nvidia is using AWS re:Invent as the platform to introduce NeMo Retriever technology. This technology employs a Retrieval Augmented Generation (RAG) approach to seamlessly connect enterprise data with generative AI.

The collaboration between Nvidia and AWS spans over 13 years, with Nvidia GPUs initially appearing in AWS cloud computing instances in 2010. According to Ian Buck, VP of Hyperscale and HPC at Nvidia, the partnership has focused on enhancing innovation and operations at AWS, benefiting mutual customers such as Anthropic, Cohere, and Stability AI. The collaboration has extended beyond hardware, encompassing various software integrations and behind-the-scenes cooperation.

DGX Cloud Introduces Supercomputing Power to AWS: DGX Cloud is not a new concept from Nvidia; it was initially unveiled in March at Nvidia’s GPU Technology Conference (GTC). Nvidia had previously announced DGX Cloud for Microsoft Azure and Oracle Cloud Infrastructure (OCI).

DGX Cloud essentially offers optimized deployment of Nvidia hardware and software, enabling supercomputing capabilities for AI applications. The noteworthy aspect of this DGX Cloud announcement is that it marks the debut of DGX Cloud powered by the NVIDIA Grace Hopper, incorporating ARM compute with GPUs. The AWS version of DGX Cloud will run the new GH200 chips in a rack architecture known as the GH200 NVL-32, integrating 32 GH200 superchips connected with Nvidia’s high-speed NVLink networking technology. This system can deliver up to 128 petaflops of AI performance, with a total of 20 terabytes of high-speed memory across the entire rack, representing a novel rack-scale GPU architecture for generative AI.

Project Ceiba Aims to Construct the World’s Largest Cloud AI Supercomputer: Nvidia and AWS have also revealed Project Ceiba, an ambitious endeavor to create the world’s largest cloud AI supercomputer. This project will incorporate 16,000 Grace Hopper Superchips and leverage AWS’ Elastic Fabric Adapter (EFA), AWS Nitro system, and Amazon EC2 UltraCluster scalability technologies. The resulting system will deliver a staggering 64 Exaflops of AI performance and house up to 9.5 Petabytes of total memory. This supercomputer will be hosted within AWS infrastructure, utilized by Nvidia’s research and engineering teams for various AI-related research endeavors.

NeMo Retriever: Advancing Enterprise-Grade Chatbots: Nvidia’s NeMo Retriever technology, unveiled at AWS re:Invent, is geared toward enhancing enterprise-grade chatbots. Traditional large language models (LLMs) rely on publicly available data, limiting their data sets’ scope. To access the most current and accurate data, there is a need to integrate LLMs with enterprise data sources, enabling organizations to obtain precise information efficiently.

NeMo Retriever features a collection of enterprise-grade models and retrieval microservices prebuilt for seamless deployment and integration into enterprise workflows. It also incorporates accelerated vector search to optimize performance when retrieving data from vector databases. Some early adopters of NeMo Retriever include companies like Dropbox, SAP, and ServiceNow.

“This offers state-of-the-art accuracy and the lowest possible latency for retrieval-augmented generation,” emphasized Ian Buck.

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.