NVIDIA Datacentre GPU Buyers Guide

The GPU (Graphics Processing Unit) or graphics card is the most important component in a server. This is because it's the GPU that does most of the work when it comes to rendering graphics and video, running simulations and AI models. This guide will teach you everything you need to know so you pick the perfect model for your servers.

What makes NVIDIA Datacentre GPUs Special

NVIDIA datacentre GPUs feature a whole host of extra features and capabilities that their consumer counterparts lack.

encrypted

Certified Drivers

ISVs such as Autodesk, Dassault and Siemens certify their applications, ensuring optimal stability backed by enterprise-class customer support.

memory_alt

Enterprise Class

Enterprise-class components ensure better reliability and resiliency, reducing failure rates especially when used at full load for longer periods of time.

running_with_errors

ECC Memory

Error correcting code (ECC) memory acts to protect data from corruption, so any errors are eradicated prior to them affecting the workload being processed.

memory

Extended Memory

Larger onboard frame buffers than consumer GPUs enable larger and more complex renders and compute simulations to be processed.

lock

Security

USB-C ports can be disabled, increasing data integrity when installed in secure environments or when used with sensitive information

inventory

Extended Warranty

The standard warranty provides cover for 3 years in professional environments and can be extended to total of 5 years upon request.

The NVIDIA Datacentre GPU Range

The following table gives an overview of which GPUs are most suitable for different workloads, ranging from machine learning (ML), deep learning (DL) and artificial intelligence (AI) - both training and inferencing - as these require quite different attributes. We also grade them for scientific compute loads often referred to as HPC, rendering and finally cloud-native NVIDIA vGPU platforms such as virtual PCs (vPC), virtual workstations (vWS) and Omniverse Enterprise.

	RTX PRO 6000 Blackwell Server	H200	H100	A100	A30	L40S	L40	A40	A10	A16	L4	A2
ML / DL / AI - TRAINING	✔	✔	✔	✔	✔	✔	✖	✖	✖	✖	✖	✖
ML / DL / AI - INFERENCING	✔	✔	✔	✔	✔	✔	✔	✔	✔	✖	✔	✔
HPC	✔	✔	✔	✔	✔	✖	✖	✖	✖	✖	✖	✖
RENDERING	✔	✖	✖	✖	✖	✔	✔	✔	✔	✖	✖	✖
vPC	✔	✖	✖	✖	✖	✖	✖	✖	✖	✔	✔	✔
vWS	✔	✖	✖	✖	✖	✔	✔	✔	✔	✖	✔	✖
OMNIVERSE	✔	✔	✔	✔	✔	✔	✔	✔	✖	✖	✖	✖

RTX PRO 6000 Blackwell Server

354.5

Ray Tracing Performance (TFLOPS)

TBC

Double Precision FP64 Performance (TFLOPS)

117

Single Precision FP32 Performance (TFLOPS)

TBC

Half Precision FP16 Performance (TFLOPS)

✔

Ray Tracing

✔

VR Ready

✖

NVLink

The RTX PRO 6000 Blackwell Server is a powerful datacentre PCIe GPU based on the Blackwell architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is equipped with 24,604 CUDA cores, 752 5th gen Tensor cores, 188 4th gen RT cores plus a huge 96GB of ultra-reliable GDDR7 ECC memory.

CUDA

CUDA cores are the workhorse in Blackwell GPUs, as the architecture supports many cores and accelerates workloads up to 28% (FP32) faster than the previous Ada Lovelace generation

RAY TRACING

Blackwell GPUs feature fourth generation RT cores delivering up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fifth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ada Lovelace GPUs, they also support FP4 precision.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as four independent instances, giving multiple users access to GPU acceleration.

VIEW RANGE

H200

N/A

Ray Tracing Performance (TFLOPS)

67

Double Precision FP64 Performance (TFLOPS)

989

Single Precision FP32 Performance (TFLOPS)

1,979

AI Performance (TOPS)

✖

Ray Tracing

✖

VR Ready

✔

NVLink

The H200 is the flagship datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads, such as LLMs and generative AI. It is only available in SXM and PCIe versions, both equipped with 16,896 CUDA cores and 528 4th gen Tensor cores plus a huge 141GB of ultra-reliable HBM3e ECC memory. The PCIe version has lower performance.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

Multi-Instance GPU (MIG) fully isolates at the hardware level allowing memory, cache and cores to be partitioned into as many as seven independent instances, giving multiple users access to GPU acceleration.

VIEW RANGE

H100

N/A

Ray Tracing Performance (TFLOPS)

34 / 26

Double Precision FP64 Performance (TFLOPS)

989 / 756

Single Precision FP32 Performance (TFLOPS)

1,979 / 1,513

Half Precision FP16 Performance (TFLOPS)

✖

Ray Tracing

✖

VR Ready

✔

NVLink

The H100 is an extremely high performance datacentre GPU based on the Hopper architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in an SXM version equipped with 16,896 CUDA cores, 528 4th gen Tensor cores and 80GB of ultra-reliable ECC memory plus the H100 NVL PCIe version which differs by having the same number of cores as the SXM version but has 94GB of ultra-reliable ECC memory. There's also an older H100 PCIe version which has 14,592 CUDA cores, 456 4th gen Tensor cores and 80GB of ultra-reliable ECC memory.

CUDA

CUDA cores are the workhorse in Hopper GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

DPX INSTRUCTIONS

DPX instructions accelerate dynamic programming algorithms by up to 7x on a Hopper-based GPU, compared with the previous Ampere architecture.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

VIEW RANGE

A100

N/A

Ray Tracing Performance (TFLOPS)

9.7

Double Precision FP64 Performance (TFLOPS)

312

Single Precision FP32 Performance (TFLOPS)

624

Half Precision FP16 Performance (TFLOPS)

✖

Ray Tracing

✖

VR Ready

✔

NVLink

The A100 is the flagship datacentre GPU based on the older Ampere architecture and is designed for the most demanding deep learning, AI and HPC workloads. It is available in both PCIe and SXM form factors, equipped with 6,192 CUDA cores and 432 3rd gen Tensor cores plus either 40 or 80GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

VIEW RANGE

A30

N/A

Ray Tracing Performance (TFLOPS)

5.2

Double Precision FP64 Performance (TFLOPS)

165

Single Precision FP32 Performance (TFLOPS)

330

Half Precision FP16 Performance (TFLOPS)

✖

Ray Tracing

✖

VR Ready

✔

NVLink

The A30 is cut down version of the A100 to hit a lower price point. It is based on the same Ampere GA100 architecture and is designed for deep learning, AI and HPC workloads. It is equipped with 3,804 CUDA cores and 224 3rd gen Tensor cores plus 24GB of ultra-reliable HBM2 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

MIG

VIEW RANGE

L40S

209

Ray Tracing Performance (TFLOPS)

1.4

Double Precision FP64 Performance (TFLOPS)

366

Single Precision FP32 Performance (TFLOPS)

733

Half Precision FP16 Performance (TFLOPS)

✔

Ray Tracing

✔

VR Ready

✖

NVLink

The L40S is the flagship datacentre GPU based on the Ada Lovelace architecture and is designed primarily for high-end graphics and AI workloads. It has the same overall configuration as the L40, with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 ECC memory. However, the L40S features improved Tensor cores which deliver double the performance of the L40 at TF32 and TF16, making it a far superior card for training and inferencing AI models.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VIEW RANGE

L40

209

Ray Tracing Performance (TFLOPS)

1.4

Double Precision FP64 Performance (TFLOPS)

181

Single Precision FP32 Performance (TFLOPS)

362

Half Precision FP16 Performance (TFLOPS)

✔

Ray Tracing

✔

VR Ready

✖

NVLink

The L40 is a high performance datacentre GPU based on the Ada Lovelace architecture and is designed primarily for visualisation applications. It is equipped with 18,176 CUDA cores, 528 4th gen Tensor cores, 142 3rd gen RT cores plus 48GB of ultra-reliable GDDR6 memory. The L40 should not be confused with the L40S, which has an improved Tensor core design that is twice as fast at TF32 and TF16, making the L40S a far better choice for deep learning and AI workloads.

CUDA

CUDA cores are the workhorse in Ada Lovelace GPUs, as the architecture supports many cores and accelerates workloads up to 1.5x (FP32) of the previous Ampere generation.

RAY TRACING

Ada Lovelace GPUs feature third generation RT cores delivering up up to double the real-time photorealistic ray-tracing performance of the previous generation GPUs.

DATA SCIENCE & AI

Fourth generation Tensor cores boost scientific computing and AI development with up to 3x faster performance compared to Ampere GPUs and support mixed floating-point acceleration.

VIEW RANGE

A10

TBC

Ray Tracing Performance (TFLOPS)

0.9

Double Precision FP64 Performance (TFLOPS)

125

Single Precision FP32 Performance (TFLOPS)

250

Half Precision FP16 Performance (TFLOPS)

✔

Ray Tracing

✔

VR Ready

✖

NVLink

The A10 is cut down version of the A40 to hit a lower price point. It is based on the same Ampere GA102 architecture and is designed primarily for visualisation applications and deep learning inferencing. It is equipped with 9,216 CUDA cores, 288 3rd gen Tensor cores, 72 2nd RT cores plus 24GB of ultra-reliable GDDR6 ECC memory.

CUDA

CUDA cores are the workhorse in Ampere GPUs, as the architecture supports many cores and accelerates workloads up to 2x (FP32) of the previous generation.

SPARSITY

Ampere GPUs provide up to double the performance for sparse models. This feature benefits AI inference and model training, as compressing sparse matrices also reduces the memory and bandwidth use.

DATA SCIENCE & AI

Third generation Tensor cores boost scientific computing and AI development with up to 2x faster performance compared to the previous generation with hardware-support for structural sparsity.

VIEW RANGE

NVIDIA Professional Datacentre GPU Summary

The below table summarises each GPUs performance along with their technical specifications.

	RTX PRO 6000 Blackwell Server	H200	H100	A100	A30	L40S	L40	A40	A10	A16	L4	A2
RAY TRACING PERFORMANCE (TFLOPS)	354.5	N/A	N/A	N/A	N/A	209	209	73.1	TBC	TBC	TBC	TBC
DOUBLE PRECISION / FP64 (TFLOPS)	TBC	67 / 60	34 / 26	9.7	5.2	N/A	N/A	N/A	N/A	N/A	N/A	N/A
SINGLE PRECISION / FP32 (TFLOPS)	TBC	898 / 835	989 / 756	312	165	366	181	149.6	125	4 x 18	120	18
HALF PRECISION / FP16 (TFLOPS)	TBC	1,979 / 1,671	1,979 / 1,513	624	330	733	362	299.4	250	4 x 35.9	242	36
RAY TRACING	✔	✖	✖	✖	✖	✔	✔	✔	✔	✔	✔	✔
VR READY	✔	✖	✖	✖	✖	✖	✖	✔	✔	✖	✔	✖
NVLINK	✖	✔	✔	✔	✔	✖	✖	✔	✖	✖	✖	✖
ARCHITECTURE	Blackwell	Hopper	Hopper	Ampere	Ampere	Ada Lovelace	Ada Lovelace	Ampere	Ampere	Ampere	Ada Lovelace	Ampere
FORM FACTOR	PCIe 5	SXM5 / PCIe 5	SXM5/ PCIe 5	SXM4/ PCIe 4	PCIe 4	PCIe 4	PCIe 4	PCIe 4	PCIe 4	PCIe 4	PCIe 4	PCIe 4
GPU	GB202	H200	H100	GA100	GA100	AD102	AD102	GA102	GA102	GA102	AD104	GA102
CUDA CORES	24,604	16,896	16,896 or 14,592	6,912	3,804	18,176	18,176	10,752	9,216	4x 1,280	7,680	1,280
TENSOR CORES	752 5th gen	528 4th gen	528 or 456 4th gen	432 3rd gen	224 3rd gen	568 4th gen	568 4th gen	336 3rd gen	288 3rd gen	4x40 3rd gen	240 4th gen	40 3rd gen
RT CORES	188 4th gen	0	0	0	0	142 3rd gen	142 3rd gen	84 2nd gen	72 2nd gen	4x10 2nd gen	60 3rd gen	10 2nd gen
MEMORY	96GB GDDR7	141GB HBM3e	80 or 94GB HBM3	40 or 80GB HBM2	24GB HBM2	48GB GDDR6	48GB GDDR6	48GB GDDR6	24GB GDDR6	4x 16GB GDDR6	24GB GDDR6	16GB GDDR6
ECC MEMORY	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
MEMORY CONTROLLER	512-bit	5,120-bit	5,120-bit	5,120-bit	3,072-bit	384-bit	384-bit	384-bit	384-bit	384-bit	192-bit	128-bit
NVLINK SPEED	✖	900GB/sec	900GB/sec	600GB/sec	200GB/sec	✖	✖	112GB/sec	✖	✖	✖	✖
TDP	600W	300W-700W	300W-700W	250W	165W	350W	300W	300W	150W	250W	72W	60W

Ready to Buy?

All NVIDIA datacentre GPUs must be purchased as part of a 3XS Systems server build, rather than being able to buy them standalone like their workstation counterparts. For organisations that fall into either higher education or further education sectors, supported pricing can be obtained that will be applied to the entire server build.

GPU-ACCELERATED SERVERS FOR GRAPHICS

CONFIGURE NOW

GPU-ACCELERATED SERVERS FOR VIRTUALISATION

CONFIGURE NOW

GPU-ACCELERATED SERVERS FOR DEEP LEARNING & AI

CONFIGURE NOW

We hope you've found this NVIDIA datacentre GPU buyer's guide helpful, however if you would like further advice on choosing the correct GPU for your use case or project, then don't hesitate to get in touch on 01204 474747 or at [email protected].

01204 474747