A new era of performance with NVIDIA H100
The fourth-generation DGX AI appliance is built around the new Hopper architecture, providing unprecedented performance in a single system and unlimited scalability with the DGX POD and SuperPOD enterprise-scale infrastructures. The DGX H100 features eight H100 Tensor Core GPUs, each with 80MB of memory, providing up to 6x more performance than previous generation DGX appliances, and is supported by a wide range of NVIDIA AI software applications and expert support.
8x NVIDIA H100 GPUs WITH 640 GIGABYTES OF TOTAL GPU MEMORY 18x NVIDIA® NVLink®
connections per GPU, 900 gigabytes per second of GPU-to-GPU bidirectional bandwidth
4x NVIDIA NVSWITCHES™
7.2 terabytes per second of bidirectional GPU-to-GPU bandwidth, 1.5X more than previous generation
8x NVIDIA CONNECTX®-7 and 2x NVIDIA BLUEFIELD® DPU 400 GIGABITS-PER-SECOND NETWORK INTERFACE
1 terabyte per second of peak bidirectional network bandwidth
DUAL x86 CPUs AND 2 TERABYTES OF SYSTEM MEMORY
Powerful CPUs for the most intensive AI jobs
30 TERABYTES NVME SSD
High speed storage for maximum performance
The Transformer Engine uses a combination of software and specially designed hardware to accelerate transformer model training and inferencing, such as those commonly used in language models such as BERT and GPT-3. The Transformer Engine intelligently manages and dynamically switches between FP8 and FP16 calculations, automatically handling re-casting and scaling between the two levels of precision, speeding up large language models compared to the previous generation Ampere architecture.
The H100 Tensor Core GPUs in the DGX H100 feature fourth-generation NVLink which provides 900GB/s bidirectional bandwidth between GPUs, over 7x the bandwidth of PCIe 5.0.
Building on the capabilities of NVLink and NVSwitch within the DGX H100, the new NVLink NVSwitch System enables scaling of up to 32 DGX H100 appliances in a SuperPOD cluster with up to 57.6TB/s of aggregate bandwidth.
Previously generation GPU-accelerators did not support confidential computing, with data only being encrypted when at rest in storage or in transit across the LAN. Hopper is the first GPU architecture to include support for confidential computing, securing data from unauthorised access as it passes through the DGX H100. NVIDIA confidential computing provides hardware-based isolation of multiple instances sharing a H100 GPU using MIG, single-user H100 GPUs and between multiple H100 GPUs.
Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA H100 GPU. MIG can partition the H100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimising utilisation and extending the reach of accelerated computing resources to every user.
Expand GPU access to more users
With MIG, you can achieve up to 7X more GPU resources on a single H100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.
Optimise GPU utilisation
MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.
Run simultaneous mixed workloads
MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.
Up to 7 GPU instances in a single H100
Dedicated SM, Memory, L2 Cache, Bandwidth for hardware QoS & isolation
Simultaneous workload execution with guaranteed quality of service
All MIG instances run in parallel with predicatable throughput & latency
Right-sized GPU allocation
Different sized MIG instances based on target workloads
To run any type of workload on a MIG instance
Diverse deployment environment
Supported with Bare metal, Docker, Kubernetes, Virtualised env.
Hardware-based isolation of individual MIG instances.
Dynamic programming is a popular programming technique that breaks down complex problems using two methods, recursion and memoization. Traditionally these tasks were run on CPUs or FPGAs, but the Hopper architecture introduces new DPX instructions, enabling the GPU to offload these computationally intensive algorithms, boosting performance by up to 7x.
The NVIDIA GPU Cloud
The NGC provides researchers and data scientists with simple access to a comprehensive catalogue of GPU-optimised software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA H100 tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualisation containers, and partner applications.Find out more
The DGX H100 POD
As an end-to-end AI solution provider, Scan can provide complete DGX H100 datacentre solutions featuring certified storage platforms in the form of DGX POD configurations. The advanced DGX SuperPOD pushes performance even further adding NVIDIA Bluefield networking and security technologies combined with NVIDIA Command Base orchestration software to ease management of this environment.Find out more
AI Optimised Storage
Deep learning appliances such as the DGX H100 only works as intended if the GPU accelerators are fed data consistently and rapidly enough that the maximum utilisation is delivered. Scan offers a wide range of AI-optimised storage appliances suitable for deployment with the DGX H100.Find out more
|NVIDIA DGX H100|
|GPUs||8x NVIDIA H100 Tensor Core GPUs|
|GPU Specifications||16,896 CUDA cores & 528 TF32 Tensor Cores per GPU|
|GPU Memory||80GB per GPU - 640GB total|
|Host CPUs||2x x86, spec TBC|
|System Memory||2TB ECC Reg DDR4|
|System Drives||2x 1.92TB NVMe SSDs|
|Storage Drives||8x 3.84TB NVMe SSDs|
|Networking||8x single-port NVIDIA ConnectX-7 400Gb/s Infiniband/Ethernet. 2x dual-port NVIDIA BlueField 3 DPUs each with 1x 400Gb/s Infiniband/Ethernet and 1x 200GB/s Infiniband/Ethernet.|
|Operating System||DGX OS / Ubuntu Linux / Red Hat Enterprise Linux|
|Operating Temperature Range||5ºC to 30ºC (41ºF to 86ºF)|
Available from late 2022
Be at the forefront of technology with the NVIDIA DGX H100