According to an official release from Microsoft Azure, Microsoft is applying the last decade of supercomputing experience and experience supporting very large AI training workloads to build an AI infrastructure with high performance at scale. The Microsoft Azure Smart Cloud, specifically virtual machines (VMs) accelerated with graphics processing units (GPUs), provides the foundation for generative AI development for Microsoft and its customers.
Microsoft is now shipping the ND H100 v5 VM, Azure’s more powerful and highly scalable family of AI virtual machines to date. The VMs support on-demand configurations of up to 8 to thousands of NVIDIA H100 GPUs interconnected over Quantum-2 InfiniBand networks, enabling the significantly higher performance of AI models. Compared to the previous generation of ND A100 v4 VMs, this virtual machine release includes the following innovations:
- 8 NVIDIA H100 Tensor Core GPUs interconnected via next-generation NV Switch and NV Link 4.0.
- Each GPU is equipped with 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand and 3.2Tb/s performance per VM in a non-blocking fat-tree network.
- The eight local GPUs in each VM are interconnected with each other via NV Switch and NV Link 4.0 with 3.8Tb/s of pairwise bandwidth.
- 4th generation Intel Xeon Scalable processors.
- PCIE 5th generation host to GPU interconnect with 64Gb/s per GPU bandwidth.
- 16-channel 4800 MHz DDR5 memory.
NVIDIA Quantum-2 uses the 7th generation NVIDIA InfiniBand architecture to provide AI developers and scientific researchers with superb network performance and rich features to help them solve challenging problems, remote direct memory access (RDMA) and ultra-fast speeds of up to 400 Gb/s to power advanced supercomputing data centers.
Microsoft says that large-scale AI is built into Azure’s DNA. Initial investments in large-scale language modelling research, such as Turing, and milestones such as building the first AI supercomputers in the cloud, have prepared the ground for the introduction of generative AI. Azure OpenAI services enable customers to leverage the power of large-scale generative AI models. “Scale” has always been one of the goals of Azure’s optimized AI infrastructure. Now, Microsoft is bringing supercomputing capabilities to startups and enterprises of all sizes without significant physical hardware or software investment.
Now, with the preview release of ND H100 v5, it will become a standard service in the Azure portfolio.