NVIDIA Datacentre GPU Buyers Guide

What is an NVIDIA datacentre GPU?

The GPU (Graphics Processing Unit) is the central component of a graphics card or GPU-accelerator and is critical for speeding up visualisation and compute workloads. This is achieved by offloading these workloads from the CPU and system memory into the GPU and GPU memory, where the architecture is much more parallel in nature - allowing many tasks to be performed simultaneously. Datacentre GPUs are also often referred to as professional or enterprise-grade due to the higher calibre of components used when compared to consumer GPUs.

Unlike NVIDIA GeForce gaming GPUs, NVIDIA datacentre GPUs do not have a family name, but simply use a letter to denote their architectural generation - more on this later. NVIDIA datacentre GPUs are designed for the rendering of high resolution images and video concurrently - both hugely parallel workloads, and because GPUs can perform parallel operations on multiple sets of data, they are also perfect for non-graphical tasks such as scientific computing and the development of machine learning and AI models. Datacentre GPUs can also be accessed virtually by multiple users at the same time and when used this way are known as virtual GPUs (vGPU).

It’s worth noting that not all NVIDIA datacentre GPUs are available as PCIe cards, some are also available as SXM modules in proprietary systems such as NVIDIA DGX appliances as well as industry-standard NVIDIA HGX and MGX severs.

What makes NVIDIA datacentre GPUs special

NVIDIA datacentre GPUs feature a whole host of extra features and capabilities that their consumer counterparts lack.

Certified Drivers

ISVs such as Autodesk, Dassault and Siemens certify their applications, ensuring optimal stability backed by enterprise-class customer support.

Enterprise Class

Enterprise-class components ensure better reliability and resiliency, reducing failure rates especially when used at full load for longer periods of time.

ECC Memory

Error correcting code (ECC) memory acts to protect data from corruption, so any errors are eradicated prior to them affecting the workload being processed.

Extended Memory

Larger onboard frame buffers than consumer GPUs enable larger and more complex renders and compute simulations to be processed.

Extended Warranty

The standard warranty provides cover for 3 years in professional environments and can be extended to total of 5 years upon request.

The NVIDIA datacentre GPU range

The following table gives an overview of which GPUs are most suitable for different workloads, ranging from machine learning (ML), deep learning (DL) and artificial intelligence (AI) - both training and inferencing - as these require quite different attributes. We also grade them for scientific compute loads often referred to as HPC, rendering and finally cloud-native NVIDIA vGPU platforms such as virtual PCs (vPC), virtual workstations (vWS) and Omniverse Enterprise.

H200 H100 A100 A30 L40S L40 A40 A10 A16 L4 A2
ML / DL / AI - TRAINING
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
ML / DL / AI - INFERENCING
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
HPC
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
RENDERING
No
No
No
No
Yes
Yes
Yes
Yes
No
No
No
vPC
No
No
No
No
No
No
No
No
Yes
Yes
Yes
vWS
No
No
No
No
Yes
Yes
Yes
Yes
No
Yes
No
OMNIVERSE
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No

Going into more detail, we have ranked each card from highest to lowest performing with additional information on their architecture, cores, memory and performance for a number of tasks. The visualisation score tells you how good a GPU is at rendering; whereas the computational scores are ranked for tasks such as simulation, deep learning and AI workloads. These are split into calculation type - FP64 / TF64 (double precision), FP32 /TF32 (single precision) and FP16 / TF16 (half-precision), as some workloads rely on a specific type.

H200

h200 Graphics Card

The H200 is the flagship datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is only available in the SXM form factor, and is equipped with 16,896 CUDA cores and 528 4th gen Tensor cores plus a huge 141GB of ultra-reliable HBM3e ECC memory.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION PERFORMANCE N/A
COMPUTE PERFORMANCE (FP64/TF64) 11
COMPUTE PERFORMANCE (FP32/TF32) 11
COMPUTE PERFORMANCE (FP16/TF16) 11
0 5 10

Real Time Ray Tracing No

VR Ready No

NVLink Yes

VIEW RANGE

H100

RTX 8000 Graphics Card

The H100 is an extremely high performance datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in an SXM version equipped with 16,896 CUDA cores, 528 4th gen Tensor cores and 80GB of ultra-reliable ECC memory plus the H100 NVL PCIe version which differs by having the same number of cores as the SXM version but has 94GB of ultra-reliable ECC memory. There’s also an older H100 PCIe version which has 14,592 CUDA cores, 456 4th gen Tensor cores and 80GB of ultra-reliable ECC memory.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION PERFORMANCE N/A
COMPUTE PERFORMANCE (FP64/TF64) 10
COMPUTE PERFORMANCE (FP32/TF32) 10
COMPUTE PERFORMANCE (FP16/TF16) 10
0 5 10

Real Time Ray Tracing No

VR Ready No

NVLink Yes

VIEW RANGE

A100

RTX 8000 Graphics Card

*Long lead time, consider DGX or L40S instead

The A100 is the flagship datacentre GPU based on the older Ampere architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in both PCIe and SXM form factors, equipped with 6,192 CUDA cores and 432 3rd gen Tensor cores plus either 40 or 80GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION PERFORMANCE N/A
COMPUTE PERFORMANCE (FP64/TF64) 7
COMPUTE PERFORMANCE (FP32/TF32) 7
COMPUTE PERFORMANCE (FP16/TF16) 7
0 5 10

Real Time Ray Tracing No

VR Ready No

NVLink Yes

VIEW RANGE

A30

RTX 8000 Graphics Card

The A30 is cut down version of the A100 to hit a lower price point. It is based on the same Ampere GA100 architecture and is designed for deep learning, AI and HPC workloads. It is equipped with 3,804 CUDA cores and 224 3rd gen Tensor cores plus 24GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VISUALISATION PERFORMANCE N/A
COMPUTE PERFORMANCE (FP64/TF64) 5
COMPUTE PERFORMANCE (FP32/TF32) 5
COMPUTE PERFORMANCE (FP16/TF16) 5
0 5 10

Real Time Ray Tracing No

VR Ready No

NVLink Yes

VIEW RANGE

L40S

RTX 8000 Graphics Card

The L40S is the flagship datacentre GPU based on the Ada Lovelace architecture and is designed primarily for high-end graphics and AI workloads. It has the same overall configuration as the L40, with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory. However, the L40S features improved Tensor cores which deliver double the performance of the L40 at TF32 and TF16, making it a far superior card for training and inferencing AI models.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION PERFORMANCE 10
COMPUTE PERFORMANCE (FP64/TF64) 4
COMPUTE PERFORMANCE (FP32/TF32) 6/8
COMPUTE PERFORMANCE (FP16/TF16) 6/8
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

NVLink No

VIEW RANGE

L40

RTX 8000 Graphics Card

The L40 is a high performance datacentre GPU based on the Ada Lovelace architecture and is designed primarily for visualisation applications. It is equipped with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 memory. The L40 should not be confused with the L40S, which has an improved Tensor core design that is twice as fast at TF32 and TF16, making the L40S a far better choice for deep learning and AI workloads.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION PERFORMANCE 10
COMPUTE PERFORMANCE (FP64/TF64) 4
COMPUTE PERFORMANCE (FP32/TF32) 6
COMPUTE PERFORMANCE (FP16/TF16) 6
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

NVLink No

VIEW RANGE

A40

RTX 8000 Graphics Card

The A40 is the flagship datacentre GPU based on the Ampere GA102 architecture and is designed primarily for visualisation and demanding virtualised graphics. It is equipped with 10,752 CUDA cores, 336 3rd gen Tensor cores, 84 2nd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION PERFORMANCE 8
COMPUTE PERFORMANCE (FP64/TF64) 3
COMPUTE PERFORMANCE (FP32/TF32) 3
COMPUTE PERFORMANCE (FP16/TF16) 3
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

NVLink Yes

VIEW RANGE

A10

RTX 8000 Graphics Card

The A10 is cut down version of the A40 to hit a lower price point. It is based on the same Ampere GA102 architecture and is designed primarily for visualisation applications and deep learning inferencing. It is equipped with 9,216 CUDA cores, 288 3rd gen Tensor cores, 72 2nd RT cores plus 24GB of ultra-reliable GDDR6 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION PERFORMANCE 7
COMPUTE PERFORMANCE (FP64/TF64) 2
COMPUTE PERFORMANCE (FP32/TF32) 2
COMPUTE PERFORMANCE (FP16/TF16) 2
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

NVLink No

VIEW RANGE

A16

RTX 8000 Graphics Card

The A16 is a specialist GPU accelerator for providing VDI experiences to client devices using NVIDIA vGPU services. Unlike other GPUs such as A40 which are optimised to drive relatively graphically demanding vWS sessions, the A16 is optimised to drive everyday Windows desktop applications using vPC sessions. Featuring four Ampere GPUs each with 1,280 CUDA cores and 16GB of server-grade error code correcting (ECC) memory the A16 is ideal for sessions running every day office applications, streaming video and teleconferencing tools.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

VISUALISATION PERFORMANCE 3
COMPUTE PERFORMANCE (FP64/TF64) N/A
COMPUTE PERFORMANCE (FP32/TF32) N/A
COMPUTE PERFORMANCE (FP16/TF16) N/A
0 5 10

Real Time Ray Tracing Yes

VR Ready No

NVLink No

VIEW RANGE

L4

RTX 8000 Graphics Card

The L4 is a half-height low-power GPU based on the Ada Lovelace architecture and is designed primarily for deep learning inferencing plus less demanding graphics and video workloads. It is equipped with 7,680 CUDA cores, 240 Tensor cores, 60 RT cores plus 24GB of server-grade error code correcting (ECC) GDDR6 memory.

COMPUTE

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VISUALISATION PERFORMANCE 4
COMPUTE PERFORMANCE (FP64/TF64) 2
COMPUTE PERFORMANCE (FP32/TF32) 2
COMPUTE PERFORMANCE (FP16/TF16) 2
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

NVLink No

VIEW RANGE

A2

RTX 8000 Graphics Card

The A2 is a compact, half-height GPU based on the Ampere GA102 architecture and is designed primarily for deep learning inferencing. It is equipped with 1,280 CUDA cores, 40 3rd gen Tensor cores, 10 2nd RT cores plus 16GB of server-grade error code correcting (ECC) GDDR6 memory.

COMPUTE

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VISUALISATION PERFORMANCE 3
COMPUTE PERFORMANCE (FP64/TF64) 1
COMPUTE PERFORMANCE (FP32/TF32) 1
COMPUTE PERFORMANCE (FP16/TF16) 1
0 5 10

Real Time Ray Tracing Yes

VR Ready No

NVLink No

VIEW RANGE

NVIDIA Professional datacentre GPU Summary

The below table summarises each GPUs performance along with their technical specifications.

H200 H100 A100 A30 L40S L40 A40 A10 A16 L4 A2
RATINGS
VISUALISATION PERFORMANCE
No
No
No
No
10 10 7 3 3 4 3
COMPUTER PERFORMANCE (FP64 / TF64) 11 10 7 5 4 4 3 2
No
2 1
COMPUTER PERFORMANCE (FP32 / TF32) 11 10 7 5 6/8 6 3 2
No
2 1
COMPUTER PERFORMANCE (FP16 / TF16) 11 10 7 5 6/8 6 3 2
No
2 1
RAY TRACING
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
VR READY
No
No
No
No
Yes
Yes
Yes
Yes
No
Yes
No
NVLINK
Yes
Yes
Yes
Yes
No
No
Yes
No
No
No
No
SPECS
ARCHITECTURE Hopper Hopper Ampere Ampere Ada Lovelace Ada Lovelace Ampere Ampere Ampere Ada Lovelace Ampere
FORM FACTOR SXM5 SXM5/
PCIe 5
SXM4/
PCIe 4
PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4 PCIe 4
GPU H200 H100 GA100 GA100 AD102 AD102 GA102 GA102 GA102 AD104 GA102
CUDA CORES 16,896 16,896 or 14,592 6,912 3,804 18,176 18,176 10,752 9,216 4x 1,280 7,680 1,280
TENSOR CORES 528
4th gen
528 or 456 4th gen 432
3rd gen
224
3rd gen
568
4th gen
568
4th gen
336
3rd gen
288
3rd gen
4x40
3rd gen
240
4th gen
40
3rd gen
RT CORES 0 0 0 0 142
3rd gen
142
3rd gen
84
2nd gen
72
2nd gen
4x10
2nd gen
60
3rd gen
10
2nd gen
MEMORY 141GB HBM3e 80 or 94GB HBM3 40 or 80GB HBM2 24GB HBM2 48GB GDDR6 48GB GDDR6 48GB GDDR6 24GB GDDR6 4x 16GB GDDR6 24GB GDDR6 16GB GDDR6
ECC MEMORY
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
MEMORY CONTROLLER 5,120-bit 5,120-bit 5,120-bit 3,072-bit 384-bit 384-bit 384-bit 384-bit 384-bit 192-bit 128-bit
NVLINK SPEED 900GB/sec 900GB/sec 600GB/sec 200GB/sec
No
No
112GB/sec
No
No
No
No
TDP 300W-700W 300W-700W 250W 165W 350W 300W 300W 150W 250W 72W 60W

Ready to buy?

All NVIDIA datacentre GPUs must be purchased as part of a 3XS Systems server build, rather than being able to buy them standalone like their workstation counterparts. For organisations that fall into either higher education or further education sectors, supported pricing can be obtained that will be applied to the entire server build.


GPU-ACCELERATED SERVERS
FOR GRAPHICS

CONFIGURE NOW >

GPU-ACCELERATED SERVERS
FOR VIRTUALISATION

CONFIGURE NOW >

GPU-ACCELERATED SERVERS
FOR DEEP LEARNING & AI

CONFIGURE NOW >

We hope you’ve found this NVIDIA datacentre GPU buyer’s guide helpful, however if you would like further advice on choosing the correct GPU for your use case or project, then don’t hesitate to get in touch on 01204 474747 or at [email protected].