NVIDIA Professional GPU Buyers Guide

What is an NVIDIA Professional GPU?

The GPU (Graphics Processing Unit) is the central component of a graphics card or GPU-accelerator and has the main task of accelerating visualisation or compute workloads in order to increase system performance. This is achieved by offloading data from the CPU and system memory into the GPU and GPU memory, where the architecture is much more parallel in nature - allowing many tasks to be performed simultaneously.

Professional GPUs are designed for the rendering of very high resolution images and video concurrently - both hugely parallel workloads, and because GPUs can perform parallel operations on multiple sets of data, they are also perfect for non-graphical tasks such as deep learning and other artificial intelligence (AI) workloads and HPC scientific computations.

In this guide we’ll look at the range of NVIDIA professional GPUs on the market and the different workloads they are designed to handle best. As GPU architecture and specifications are constantly evolving we’ll also offer insight into relative performance of the various generations of GPUs too.

Purchasing Options

Although all our professional GPUs can be purchased as a standalone product there are two options that allow discounted pricing to be obtained.



Workstations & Servers


Educational Users
If a professional GPU is purchased as part of a 3XS workstation or server build then the price per unit will be discounted when compared to the regular standalone price. For organisations that fall into either higher education or further education sectors, supported pricing can be obtained on many of our professional GPU models. The discounts available can be further increased if purchased as part of a 3XS System too.
VIEW OUR RANGE OF 3XS WORKSTATIONS VIEW OUR RANGE OF 3XS SERVERS CALL FOR OUR RANGE OF GPUS FOR EDUCATION

Why Buy an NVIDIA Professional GPU?

NVIDIA professional GPU cards feature a whole host of attributes that makes them best placed to
install in business systems such as workstations and servers.

Certified Drivers

Hundreds of ISVs such as Autodesk, ANSYS, Dassault Systèmes, PTC, Siemens certify their applications with NVIDIA professional GPUs. Using certified drivers ensures optimal stability and enterprise-class customer support if you do run into any issues.

Enterprise Class

NVIDIA professional GPUs are constructed from enterprise-class components ensuring better reliability and resiliency. Failure rates of professional GPU cards are considerably lower especially when used at full load for longer periods of time.

ECC Memory

Many of the models of NVIDIA professional GPUs feature error correcting code (ECC) memory. This acts to protect data from corruption so any errors are eradicated prior to them affecting the workload being processed.

Extended Memory

NVIDIA professional GPUs feature considerably larger onboard frame buffers than consumer GPUs, enabling larger and more complex renders and compute simulations to be processed.

Security

NVIDIA professional GPUs offer higher security - for instance USB C ports on professional cards can be disabled, a critical feature if they’re deployed in secure environments or if they contain sensitive information.

Extended Warranty

NVIDIA professional GPU cards offer enhanced warranty cover. The standard warranty provides cover for 3 years in professional environments and can be extended to total of 5 years upon request.

The NVIDIA Professional GPU Hierarchy

With prices ranging from over £10,000 to as little as £130, there’s an NVIDIA professional GPU for every project and budget. The range is made up of the former Quadro and Tesla GPU families, with the addition of the latest Ampere-based cards - each having their own set of features and benefits aimed at specific tasks, which we’ll explain in the next section. To make the overview a little clearer we’ve also divided all the GPUs into five main categories relating to their performance: Ultra High-End, High-End, Mid-Range, Entry-Level and Specialist.

RTX A6000, RTX 8000,
RTX 6000
Ultra High End
RTX 5000,
RTX 4000
High End
P2200,
P1000
Mid Range
P620,
P400
Entry Level
A100, A40,
GV100, T4
Specialist

NVIDIA RTX GPUs

The most powerful professional visualisation and compute GPUs are the NVIDIA RTX series that based on the latest Ampere or previous generation Turing GPU architecture. The RTX GPU cards bring a host of features to the professional user enabling far greater creativity, control and capability.



Real-Time Ray Tracing


AI


8K
The RTX platform realises the dream of real-time cinematic-quality rendering through optimised ray-tracing APIs such as Microsoft DXR - providing the ability to render photorealistic objects and environments in real time with perfectly accurate shadows, reflections, and refractions. The RTX platform features powerful AI-enhanced capabilities into visual applications. This dramatically accelerates creativity by freeing up time and resources through intelligent manipulation of images, automation of repetitive tasks, and optimisation of compute-intensive processes. Lifelike visuals are the result of not only how something looks, but also of how it behaves. By combining CUDA cores and APIs the RTX platform enables accurate modelling of the behaviour of real-world objects in all resolutions including native 8K video content.


Virtual Reality


Scientific Computing


NVIDIA GPU Cloud (NGC)
The RTX platform features advances in programmable shading such as variable-rate shading, texture-space shading and multi-view rendering. These enable the creation of richer visuals with more fluid interactivity with large models and scenes, and the ability to create more immersive experiences in VR. Maximise productivity, reduce time to insight, and lower the cost of your data science projects with the new breed of RTX GPUs that contain a pre-installed software stack featuring NVIDIA RAPIDS for a fully integrated data science solution. Access to NGC is also included for further machine learning frameworks and applications. GPU-accelerated containers, available from the NVIDIA GPU Cloud (NGC) allow multiple RTX workloads to be performed using the same systems without overlap or interference. Data, applications and frameworks are all segregated for optimum performance.

The NVIDIA RTX platform ushers in a new generation of RTX certified applications that simulate the physical world at unprecedented speeds. Enhanced with new AI, ray tracing, and simulation technologies, RTX is a full-stack platform that enables incredible 3D designs, photorealistic simulations, and stunning visual effects in hundreds of leading content creation applications.

We’ve also mentioned that the many processing cores in a GPU are designed to process hugely parallel workloads so are ideal at delivering data science, deep learning and AI results in ever shorter time frames. However, there are a number of factors that define how quickly you can see results, the primary one being accuracy of the results required - or precision. Precision refers to the number of decimal places, or in computer terms ‘bits’ of any given result - for example 3.14 is less precise than 3.141592654. Having more bits or decimal places to represent each number gives data scientists the flexibility to represent a larger range of values, with room for a fluctuating number of digits on either side of the decimal point during the course of a computation - this is called Floating Point (FP). Within GPU specification sheets you will see terms like FP64, FP32 or FP16. FP32 refers to 32 decimal places and is termed single precision; FP64 - twice as precise at 64 decimal places is called double precision; and FP16 being half as precise is termed half-precision. The higher precision level a machine uses, the more computational resources, data transfer and memory storage it requires, so it costs more performance to calculate and it consumes more power. Since not every workload requires high precision, AI researchers can benefit by mixing and matching different levels of precision. Once an AI model is trained and ready for inference the precision is often lowered still to 8- or even 4-bits or decimal places - this is referred to as INT8 or INT4 - Integer 8 or Integer 4.

As we consider each GPU card in the guide we will offer a score from zero to ten regarding its performance for visualisation (graphical workloads), FP64, FP32, FP16 and INT8 (compute workloads) - this way you can understand any given card’s suitability for tasks or calculations thus helping you decide on the best choice for your projects.

RTX A6000

RTX A6000 Graphics Card

The RTX A6000 is the latest ultra-high-end workstation GPU card and the first based on the Ampere architecture which supports real-time ray tracing, accelerated AI, and photorealistic VR. It features 10,752 CUDA cores, 336 Tensor cores and 84 RT cores, combined with 48GB of server-grade error code correcting (ECC) memory and PCIe 4 connectivity. Supporting four displays the RTX A6000 is a supremely powerful graphics card for specialist workstations.

VISUALISATION PERFORMANCE 10
COMPUTE PERFORMANCE (FP64) 4
COMPUTE PERFORMANCE (FP32) 10
COMPUTE PERFORMANCE (FP16) 7
COMPUTE PERFORMANCE (INT8) 7
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL RTX A6000 GRAPHICS CARDS

Quadro RTX 8000

RTX 8000 Graphics Card

The Quadro RTX 8000 is the top the range Turing architecture GPU which supports hardware-accelerated ray tracing and AI. It features 4608 CUDA cores, 576 Tensor cores and 72 RT cores. However, it features double the GPU memory of the next closest card - the RTX 6000 - with a huge 48GB. Supporting four displays the RTX 8000 is an immensely powerful graphics card for specialist workstations.

VISUALISATION PERFORMANCE 9
COMPUTE PERFORMANCE (FP64) 2
COMPUTE PERFORMANCE (FP32) 6
COMPUTE PERFORMANCE (FP16) 2
COMPUTE PERFORMANCE (INT8) 6
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL QUADRO RTX 8000 GRAPHICS CARDS

Quadro RTX 6000

RTX A6000 Graphics Card

The Quadro RTX 6000 offers the same specifications as the RTX 8000, just with half the GPU memory. Based on the Turing architecture which supports hardware-accelerated ray tracing and AI, the RTX 6000 sports 4608 CUDA cores, 576 Tensor cores, 72 RT cores plus 24GB of memory. Supporting four displays the RTX 6000 is an immensely powerful graphics card for ultra-high-end workstations.

VISUALISATION PERFORMANCE 8
COMPUTE PERFORMANCE (FP64) 2
COMPUTE PERFORMANCE (FP32) 6
COMPUTE PERFORMANCE (FP64) 2
COMPUTE PERFORMANCE (INT8) 6
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL QUADRO RTX 6000 GRAPHICS CARDS

Quadro RTX 5000

RTX 8000 Graphics Card

The first high-end professional graphics card is the Quadro RTX 5000. Based on the Turing architecture which supports hardware-accelerated ray tracing and AI, the RTX 5000 has a real edge over its predecessors based on older architectures such as Pascal. The RTX 5000 sports 3072 CUDA cores, 384 Tensor cores, 48 RT cores plus 16GB of memory. Supporting four displays the RTX 5000 is the premium card for a high-end workstation.

VISUALISATION PERFORMANCE 7
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 4
COMPUTE PERFORMANCE (FP16) 1
COMPUTE PERFORMANCE (INT8) 4
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL QUADRO RTX 5000 GRAPHICS CARDS

Quadro RTX 4000

Quadro RTX 5000 Graphics Card

The other high-end professional graphics card is the Quadro RTX 4000. Based on the Turing architecture which supports hardware-accelerated ray tracing and AI, the RTX 4000 has a real edge over its predecessors based on older architectures such as Pascal. The RTX 4000 sports 2304 CUDA cores, 288 Tensor cores, 36 RT cores plus 8GB of memory. Supporting four displays the RTX 4000 is a great choice for a high-end workstation and is the most affordable Quadro card that is VR Ready.

VISUALISATION PERFORMANCE 6
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 3
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (INT8) 1
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL QUADRO RTX 4000 GRAPHICS CARDS

NVIDIA Quadro P-series GPUs

The NVIDIA Quadro P-series of professional GPUs is the perfect balance of performance, compelling features, and compact form factor delivering incredible creative experience and productivity across a variety of professional 3D applications. They feature a Pascal architecture with up to 1280 CUDA cores, and 5GB GDDR5X on-board memory, and the power to drive up to four 5K (5120x2880) displays at 60Hz natively. The P-series cards are certified with a broad range of sophisticated professional applications, tested by leading workstation manufacturers, and backed by a global team of support specialists.

Quadro P2200

RTX A6000 Graphics Card

The first mid-range NVIDIA professional graphics card is the Quadro P2200. Based on the Pascal architecture, the P2200 sports 1280 CUDA cores plus 5GB of memory. Supporting four displays the P2200 is a good choice for a mid-range workstation.

VISUALISATION PERFORMANCE 5
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 1
COMPUTE PERFORMANCE (FP16) 1
COMPUTE PERFORMANCE (INT8)
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL QUADRO P2200 GRAPHICS CARDS

Quadro P1000

RTX 8000 Graphics Card

The other mid-range graphics card in the professional range is the Quadro P1000. Based on the Pascal architecture, the P1000 sports 640 CUDA cores plus 4GB of memory. Supporting four displays the P1000 is a great choice for a mid-range workstation.

VISUALISATION PERFORMANCE 4
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 1
COMPUTE PERFORMANCE (FP16) 1
COMPUTE PERFORMANCE (INT8)
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL QUADRO P1000 GRAPHICS CARDS

Quadro P620

RTX A6000 Graphics Card

The first entry-level GPU card in the professional range is the Quadro P620. Based on the Pascal architecture, the P620 sports 512 CUDA cores plus 2GB of memory. Supporting four displays the P620 is a great choice for an entry-level workstation.

VISUALISATION PERFORMANCE 3
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 1
COMPUTE PERFORMANCE (FP16) 1
COMPUTE PERFORMANCE (INT8)
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL QUADRO P620 GRAPHICS CARDS

Quadro P400

RTX 8000 Graphics Card

The most cost-effective graphics card in the professional range is the Quadro P400. Based on the Pascal architecture, the P400 sports 256 CUDA cores plus 2GB of memory. Supporting three displays the P400 is a good choice for an entry-level workstation.

VISUALISATION PERFORMANCE 2
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 1
COMPUTE PERFORMANCE (FP16) 1
COMPUTE PERFORMANCE (INT8)
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL QUADRO P400 GRAPHICS CARDS

NVIDIA Specialist GPUs

These GPUs we’ve termed as ‘specialist’ as they are designed and tuned to excel in a very specific area - one of these focussed solutions is virtual GPU (vGPU), where NVIDIA visualisation and compute solutions can leverage the power of NVIDIA GPUs to deliver virtual experiences on desktops, workstations and in server environments, accelerating graphics and compute to make virtualised workspaces accessible to creative and technical professionals working from home offices, remote sites or anywhere. NVIDIA vGPU solutions deliver the ultimate user experience with the ability to support both compute and graphics workloads in hypervisor-based virtualisation environments.



NVIDIA Virtual Compute Server (vCS)


NVIDIA Virtual Datacentre Workstation (vDWS)


NVIDIA Virtual PC (vPC)
NVIDIA vCS provides the ability to virtualise GPUs and accelerate compute-intensive server workloads, including AI, Deep Learning, and HPC. Designers and engineers working with increasingly complex models can work more efficiently, collaborate across geographies, and bring their creations to market more quickly with NVIDIA vDWS. Workers who use graphics-intensive applications and often multi-task across dual monitors can leverage NVIDIA vPC to scale VDI deployments with a consistently appealing user experience.

You can learn more about vGPU solutions and the most suitable GPUs to support these in our vGPU Solutions within Scan Business.

FIND OUT MORE

Alternatively the GPU could be focused purely at the datacentre compute space - the NVIDIA Ampere architecture delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the most toughest computing challenges. As the engine of the NVIDIA datacentre platform, Ampere-based cards can efficiently scale to thousands of GPUs or, with NVIDIA Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes. These passively cooled GPUs are designed for server-specific high performance datacentre workloads.

A100

RTX 8000 Graphics Card

The A100 is the flagship compute acceleration card for deep learning, AI, HPC and vCS workflows. It is based on the Ampere architecture and comes in two versions - either with 40GB or 80GB of HBM2 memory. Both versions feature 6912 CUDA cores and 432 Tensor cores. It is passively-cooled and designed for server installation.

VISUALISATION PERFORMANCE
COMPUTE PERFORMANCE (FP64) 10
COMPUTE PERFORMANCE (FP32) 7
COMPUTE PERFORMANCE (FP16) 10
COMPUTE PERFORMANCE (INT8) 10
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL A100 GPU ACCELERATORS

A40

RTX 8000 Graphics Card

The A40 is a high performance compute acceleration card for deep learning, AI, HPC, vDWS and vCS workflows. It is based on the Ampere architecture and features 10,752 CUDA cores, 336 Tensor cores and 84 RT cores, combined with 48GB of server-grade error code correcting (ECC) memory. Aimed at server installation, it is passively-cooled and features PCIe Gen4 connectivity.

VISUALISATION PERFORMANCE 10
COMPUTE PERFORMANCE (FP64) 4
COMPUTE PERFORMANCE (FP32) 9
COMPUTE PERFORMANCE (FP16) 7
COMPUTE PERFORMANCE (INT8) 7
0 5 10

Real Time Ray Tracing Yes

VR Ready Yes

VIEW ALL A40 GPU ACCELERATORS

Quadro GV100

RTX 8000 Graphics Card

The GV100 is optimised for specialist applications that require a high level of precision at FP64 (double precision). It is based on the Volta architecture, and is equipped with 5120 CUDA cores and 640 Tensor cores. It features 32GB of HBM2 memory and supports four displays the GV100 is the ultimate graphics card for FP64 operations. Unlike the other specialist NVIDIA GPUs which are passively-cooled so can only be installed in certified servers, the GV100 is actively-cooled so can also be installed in workstations.

VISUALISATION PERFORMANCE 9
COMPUTE PERFORMANCE (FP64) 8
COMPUTE PERFORMANCE (FP32) 5
COMPUTE PERFORMANCE (FP16) 2
COMPUTE PERFORMANCE (INT8) 3
0 5 10

Real Time Ray Tracing No

VR Ready Yes

VIEW ALL QUADRO GV100 GRAPHICS CARDS

Tesla T4

RTX 8000 Graphics Card

The T4 GPU specialises in multi-precision computing for AI inference and high-instance low-powered vGPU such as vPC deployments. Based on the Turing architecture is features 2560 CUDA cores, 320 Tensor cores and 16GB of GPU memory. It is passively-cooled and can be installed in many non-GPU specific servers thanks to its low power requirements.

VISUALISATION PERFORMANCE 5 (vGPU ONLY)
COMPUTE PERFORMANCE (FP64) 1
COMPUTE PERFORMANCE (FP32) 3
COMPUTE PERFORMANCE (FP16) 4
COMPUTE PERFORMANCE (INT8) 3
0 5 10

Real Time Ray Tracing No

VR Ready No

VIEW ALL TESLA T4 GPU ACCELERATORS

NVIDIA Professional GPU Summary

The below table summarises the different performance ratings listed above, so as to aid a direct comparison between cards.

RTX A6000 RTX 8000 RTX 6000 RTX 5000 RTX 4000 P2200 P1000 P620 P400 A100 A40 GV100 T4
Visualisation Performance 10 9 8 7 6 5 4 3 2 N/A 10 9 5 (vGPU only)
Compute Performance (FP64) 4 2 2 1 1 1 1 1 1 10 4 8 1
Compute Performance (FP32) 10 6 6 4 3 1 1 1 1 7 9 5 3
Compute Performance (FP16) 7 2 2 1 1 1 1 1 1 10 7 2 4
Compute Performance (INT8) 7 6 6 4 1 N/A N/A N/A N/A 10 7 3 3
Ray Tracing
Yes
Yes
Yes
Yes
Yes
No
No
No
No
N/A
Yes
No
No
VR Ready
Yes
Yes
Yes
Yes
Yes
No
No
No
No
N/A
Yes
Yes
No
NVIDIA Professional GPU Specifications
Model RTX A6000 RTX 8000 RTX 6000 RTX 5000 RTX 4000 P2200 P1000 P620 P400 A100 A40 GV100 T4
Architecture Ampere Turing Turing Turing Turing Pascal Pascal Pascal Pascal Ampere Ampere Volta Turing
GPU GA102 TU102 TU102 TU104 TU104 GP106 GP107 GP107 GP107 GA100 GA102 GV100 TU104
CUDA Cores 10,752 4,608 4,608 3,072 2,304 1,280 640 512 256 6,912 10,752 5,120 2,560
Tensor Cores 336 576 576 384 288 0 0 0 0 432 336 640 320
RT Cores 84 72 72 48 36 0 0 0 0 0 84 0 0
Reference Base Clock 1,440MHz 1,440MHz 1,620MHz 1,005MHz 1,000MHz 1,266MHz 1,266MHz 1,228MHz 765MHz 1,132MHz 585MHz
Reference Boost Clock 1,770MHz 1,770MHz 1,815MHz 1,545MHz 1,493MHz 1,481MHz 1,354MHz 1,252MHz 1,410MHz 1,627MHz 1,590MHz
Memory 48GB GDDR6 48GB GDDR6 24GB GDDR6 16GB GDDR6 8GB GDDR6 5GB GDDR5X 4GB GDDR5 2GB GDDR5 2GB GDDR5 40 or 80GB HBM2 48GB GDDR6 32GB HBM2 16GB GDDR6
ECC Memory
Yes
Yes
Yes
Yes
No
No
No
No
No
Yes
Yes
Yes
Yes
Memory Controller 384-bit 384-bit 384-bit 256-bit 256-bit 160-bit 128-bit 128-bit 64-bit 5,120-bit 384-bit 4,096-bit 256-bit
NVLink Speed 112GB/sec 100GB/sec 100GB/sec 50GB/sec
No
No
No
No
No
32GB/sec 600GB/sec 112GB/sec 200GB/sec
No
Workstation compatible
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
No
Server compatible
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
TDP 300W 295W 295W 265W 160W 75W 47W 40W 30W 250W 300W 250W 75W

We hope you’ve found this professional GPU buyer’s guide helpful, however if you would like further advice on choosing the correct GPU for your use case or project, then don’t hesitate to get in touch on 01204 474747 or at [email protected]