Scan AI

Scan AI

NVIDIA DGX H100

The ultimate AI infrastructure system

A new era of performance with NVIDIA H100

The fourth-generation DGX AI appliance is built around the new Hopper architecture, providing unprecedented performance in a single system and unlimited scalability with the DGX POD and SuperPOD enterprise-scale infrastructures. The DGX H100 features eight H100 Tensor Core GPUs, each with 80MB of memory, providing up to 6x more performance than previous generation DGX appliances, and is supported by a wide range of NVIDIA AI software applications and expert support.

  • 8x NVIDIA H100 GPUs WITH 640 GIGABYTES OF TOTAL GPU MEMORY 18x NVIDIA® NVLink®
    connections per GPU, 900 gigabytes per second of GPU-to-GPU bidirectional bandwidth

  • 4x NVIDIA NVSWITCHES™
    7.2 terabytes per second of bidirectional GPU-to-GPU bandwidth, 1.5X more than previous generation

  • 8x NVIDIA CONNECTX®-7 and 2x NVIDIA BLUEFIELD® DPU 400 GIGABITS-PER-SECOND NETWORK INTERFACE
    1 terabyte per second of peak bidirectional network bandwidth

  • DUAL x86 CPUs AND 2 TERABYTES OF SYSTEM MEMORY
    Powerful CPUs for the most intensive AI jobs

  • 30 TERABYTES NVME SSD
    High speed storage for maximum performance

AI Training

Chart

AI Inference

Chart
Chart

The Transformer Engine uses a combination of software and specially designed hardware to accelerate transformer model training and inferencing, such as those commonly used in language models such as BERT and GPT-3. The Transformer Engine intelligently manages and dynamically switches between FP8 and FP16 calculations, automatically handling re-casting and scaling between the two levels of precision, speeding up large language models compared to the previous generation Ampere architecture.

The H100 Tensor Core GPUs in the DGX H100 feature fourth-generation NVLink which provides 900GB/s bidirectional bandwidth between GPUs, over 7x the bandwidth of PCIe 5.0.

Chart

Building on the capabilities of NVLink and NVSwitch within the DGX H100, the new NVLink NVSwitch System enables scaling of up to 32 DGX H100 appliances in a SuperPOD cluster with up to 57.6TB/s of aggregate bandwidth.

Previously generation GPU-accelerators did not support confidential computing, with data only being encrypted when at rest in storage or in transit across the LAN. Hopper is the first GPU architecture to include support for confidential computing, securing data from unauthorised access as it passes through the DGX H100. NVIDIA confidential computing provides hardware-based isolation of multiple instances sharing a H100 GPU using MIG, single-user H100 GPUs and between multiple H100 GPUs.

Video

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA H100 GPU. MIG can partition the H100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimising utilisation and extending the reach of accelerated computing resources to every user.

Expand GPU access to more users

With MIG, you can achieve up to 7X more GPU resources on a single H100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimise GPU utilisation

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.

Run simultaneous mixed workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.

Up to 7 GPU instances in a single H100

Dedicated SM, Memory, L2 Cache, Bandwidth for hardware QoS & isolation

Simultaneous workload execution with guaranteed quality of service

All MIG instances run in parallel with predicatable throughput & latency

Right-sized GPU allocation

Different sized MIG instances based on target workloads

Flexibility

To run any type of workload on a MIG instance

Diverse deployment environment

Supported with Bare metal, Docker, Kubernetes, Virtualised env.

Confidential Computing

Hardware-based isolation of individual MIG instances.

Chart

Dynamic programming is a popular programming technique that breaks down complex problems using two methods, recursion and memoization. Traditionally these tasks were run on CPUs or FPGAs, but the Hopper architecture introduces new DPX instructions, enabling the GPU to offload these computationally intensive algorithms, boosting performance by up to 7x.

GPU Cloud

The NVIDIA GPU Cloud

The NGC provides researchers and data scientists with simple access to a comprehensive catalogue of GPU-optimised software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA H100 tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualisation containers, and partner applications.

Find out more
DGX A100 Pod

The DGX H100 POD

As an end-to-end AI solution provider, Scan can provide complete DGX H100 datacentre solutions featuring certified storage platforms in the form of DGX POD configurations. The advanced DGX SuperPOD pushes performance even further adding NVIDIA Bluefield networking and security technologies combined with NVIDIA Command Base orchestration software to ease management of this environment.

Find out more
Ai Storage

AI Optimised Storage

Deep learning appliances such as the DGX H100 only works as intended if the GPU accelerators are fed data consistently and rapidly enough that the maximum utilisation is delivered. Scan offers a wide range of AI-optimised storage appliances suitable for deployment with the DGX H100.

Find out more
NVIDIA DGX H100
GPUs 8x NVIDIA H100 Tensor Core GPUs
GPU Specifications 16,896 CUDA cores & 528 TF32 Tensor Cores per GPU
GPU Memory 80GB per GPU - 640GB total
Host CPUs 2x x86, spec TBC
System Memory 2TB ECC Reg DDR4
System Drives 2x 1.92TB NVMe SSDs
Storage Drives 8x 3.84TB NVMe SSDs
Networking 8x single-port NVIDIA ConnectX-7 400Gb/s Infiniband/Ethernet. 2x dual-port NVIDIA BlueField 3 DPUs each with 1x 400Gb/s Infiniband/Ethernet and 1x 200GB/s Infiniband/Ethernet.
Operating System DGX OS / Ubuntu Linux / Red Hat Enterprise Linux
Power Requirement 10.2kW
Size TBC
Weight TBC
Operating Temperature Range 5ºC to 30ºC (41ºF to 86ºF)

Available from late 2022

Be at the forefront of technology with the NVIDIA DGX H100

Find out more Find out more