NVIDIA DGX Support

Support Packages for DGX H100, DGX A100 and DGX Stations, DGX-2 and DGX-1

 

Protect your Deep Learning Investment

NVIDIA DGX systems are cutting-edge hardware solutions designed to accelerate your deep learning and AI workloads and projects. Ensuring you get the most out of your investment is key and the comprehensive DGX support packages available provide peace of mind that your system or systems will remain in optimum condition. The packages are available in one, two or three year contracts that include the following:

Access to the latest software updates and upgrades

Rapid response and timely issue resolution through a support portal and 24x7 phone access

Dispatch of a customer engineer onsite if a field replacement unit is required (not DGX Station products)

Direct communication with NVIDIA support experts

Lifecycle support for the NVIDIA DGX system AI software

Run onboard diagnostic tools (from remote) to troubleshoot system issues (not DGX Station products)

A private NGC container repository for accessing NVIDIA-optimised AI software with powerful sharing and collaboration features for your organisation

Remote hardware and software support

A searchable knowledge base with how-to articles, application notes, and product documentation

Advanced shipment of replacement parts for next business day arrival

For complete details view the DGX Systems Appliance Support Services Terms and Conditions

There are also media retention packages available as follows:

NVIDIA Solid State Drive Media Retention (SDMR) Service

This allows customers to keep defective SSDs for control over sensitive data. However, this requires an active NVIDIA Enterprise Support Service contract as a prerequisite.

NVIDIA Comprehensive Media Retention (CMR) Service

This permits customers to keep defective media for control over sensitive data in all components within the covered system.
This also requires an active NVIDIA Enterprise Support Service contract as a prerequisite.

NVIDIA DGX H100

The fourth-generation DGX AI appliance is built around the new Hopper architecture, providing unprecedented performance in a single system and unlimited scalability with the DGX POD and SuperPOD enterprise-scale infrastructures.

• 8x NVIDIA H100 Tensor Core GPU
• 6,896 CUDA cores & 528 TF32 Tensor Cores per GPU
• 80GB memory per GPU - 640GB total
• 2x x86 CPUs - TBC
• 2TB ECC Reg DDR4 system memory
• 2x 1.92TB NVMe SSDs system drives
• 8x 3.84TB NVMe SSDs storage drives

NVIDIA DGX A100

The third-generation DGX A100 appliance introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This model was superseded by the DGX H100 version in 2022.

• 8x NVIDIA A100 Tensor Core GPUs
• 6,912 CUDA cores & 432 TF32 Tensor Cores per GPU
• 80GB memory per GPU - 640GB total
• 2x AMD EPYC 7742 - total 128 cores / 256 threads
• 2TB ECC Reg DDR4 system memory
• 2x 1.92TB NVMe SSDs system drives
• 8x 3.84TB NVMe SSDs storage drives

NVIDIA DGX Station A100

The second generation of the DGX station appliance introduces A100 GPUs with double-precision Tensor Cores, providing the biggest milestone HPC and AI training in a desktop server format that doesn’t require a datacentre environment.

This model was discontinued in 2022.

• 4x NVIDIA A100 Tensor Core GPUs
• 6,912 CUDA cores & 432 TF32 Tensor Cores per GPU
• 80GB memory per GPU - 320GB total
• AMD EPYC 7742 - 64 cores / 128 threads
• 512GB ECC Reg DDR4 system memory
• 1.92TB NVMe SSD system drive
• 7.68TB NVMe SSD storage drive

NVIDIA DGX-2

The second-generation DGX-2 appliance was the world’s first 2 petaFLOPS system integrating 16 NVIDIA V100 Tensor Core GPUs for large-scale AI projects. It was launched in 2018 and superseded by the DGX A100 in 2020.

• 16x NVIDIA V100 Tensor Core GPUs
• 5,120 CUDA cores & 640 Tensor Cores per GPU
• 32GB memory per GPU - 512GB total
• 2x Intel Xeon Platinum 8168 - total 48 cores / 96 threads
• 1.5TB ECC Reg DDR4 system memory
• 2x 960B NVMe SSDs system drives
• 8x 3.84TB NVMe SSDs storage drives

NVIDIA DGX Station V100

The original DGX station appliance introduced V100 GPUs in a desktop server format that doesn’t require a datacentre environment, aimed at delivering AI development and training for workgroups. This model was superseded by the Station A100 in 2020.

• 4x NVIDIA V100 Tensor Core GPUs
• 5,120 CUDA cores & 640 Tensor Cores per GPU
• 32GB memory per GPU - 128GB total
• Intel Xeon E5-2698 v4 - total 20 cores / 40 threads
• 256GB ECC Reg DDR4 system memory
• 4x 1.92TB SSDs

NVIDIA DGX-1

The original DGX-1 appliance was the world’s first dedicated AI supercomputer, featuring eight GPUs connected by NVLink technology. It was launched in 2016 and superseded by the DGX-2 in 2018.

• 8x NVIDIA V100 Tensor Core GPUs
• 5,120 CUDA cores & 640 Tensor Cores per GPU
• 32GB memory per GPU - 256GB total
• 2x Intel Xeon E5-2698 v4 - total 40 cores / 80 threads
• 512GB ECC Reg DDR4 system memory
• 4x 1.92TB SSDs