AI Training Hardware Buyers Guide

Once a development pipeline has been established and an AI model is ready for production, the next step is to train the AI model. The training phase requires far more GPU and storage resource than during development, as many iterations will be needed. Training is therefore the most expensive part of any AI project. Training requires at minimum a multi-GPU server, supported by fast storage and connected via high-throughput, low-latency networking. If these three component parts of your AI infrastructure aren’t matched or optimised, productivity and efficiency will be impacted - the fastest GPU-accelerated super computer will be a waste of money if connected to storage that cannot keep its GPUs 100% utilised.

This guide takes you through the myriad of AI training hardware options, explaining their suitability for different projects and their ability to scale - including small language models (SLMs), large language models (LLMs), vision language models (VLMs), generative AI, agentic AI and physical AI models. It’s also worth noting that many of these servers are suitable for inferencing large AI models - check out our AI Inferencing Hardware Buyers Guide for more information.

Your AI Project

The training of an AI model is always preceded by the project scoping, data preparation and AI development phases, as illustrated below.

Problem Statement

Project scope and high level ROI

Data Preparation

Classification, cleaning and structure

Model Development

Education and resource allocation

Model Training

Optimisation and scaling

Model Integration

Inferencing and deployment

Governance

Maintenance and compliance

It may not be immediately obvious, but ensuring your project scope is realistic and achievable has a large impact on what AI training hardware you’ll ultimately require. These early stages are crucial to get right - you can learn more by reading our AI Project Planning Guide and our AI Development Hardware Buyers Guide.

AI Model Training

Your AI model development phase will likely have involved using frameworks or optimised foundation models (FMs) to save time and effort when building your AI pipeline - all available in the NVIDIA AI Enterprise (NVAIE) software platform. This is optimised for NVIDIA GPUs and included in DGX Spark and DGX Station GB300 development platforms; and the DGX Servers discussed further down this guide. NVAIE is also available on subscription for the non-DGX servers too.

NVAIE provides end-to-end implementation of AI projects such as medical imaging, autonomous vehicles, avatars, drug discovery, generative AI, agentic AI, physical and robotics; scaling across development, training and inferencing platforms seamlessly.

The frameworks or FMs used in development will have a significant impact on initial model size when you get to the training phase. If you started with a single GPU workstation, then training may be possible on a single multi-GPU server, however if your AI model was developed on multi-GPU workstations then chances are that multiple multi-GPU servers will be required to train it. It is therefore key to understand the relative size of models, how they might scale and the GPU hardware that will be capable of handling their training. Just for context, between the 1950s and to 2018, AI model size grew by seven orders of magnitude (from 000’s to 30M) - from 2018 to 2022 alone, it has grown another four orders of magnitude (from 30M to 20B). Today 400B parameter models are not uncommon and ChatGPT-4 is rumoured to have 1.8T.

Type of AI Model Typical Use Case Parameters Initial Dataset Size Typical Server(s) Required
SLMs Limited scope chatbots / NLP / On-device applications 100M - 7B 1-8GB Single, multi-GPU PCIe server
LLMs Content creation / advanced chatbots / Real-time translation 7B - 20B 10-15GB Single, multi-GPU SXM server
Generative AI Advanced content creation / Personalised experiences / Drug Discovery 20B - 70B 20-40GB Multiple, multi-GPU SXM servers
Agentic AI / VLMs Personalised interactions / Data-driven insights / Autonomous vehicles 70B - 200B 60-150GB GPU server cluster
Physical AI Robotics / Digital Twins 200B - xT 200-750GB GPU server cluster

It is worth clarifying that this table is for guidance only - absolute size (in GB) of any model is determined by the number of parameters and the size of each parameter. Similarly, there will be small agentic AI models if their function is very focused and there will be very large LLMs if translation of many languages is the goal. It is also worth pointing out that the dataset size mentioned is the likely final size, so you need to consider capacity for numerous versions and many iterations before reaching the final model.

AI Hardware

The following AI training hardware options are compared and discussed in light of the model sizes above, offering recommendations and advice about the most suitable option for various scenarios. However, as previously stated, thorough planning and scoping phases will lead to much more accurate provisional model sizes and better insight into development hardware and ultimately, training hardware choice.

Traditional AI servers are built around the industry-norm of one or more x86 CPUs (either AMD or Intel) plus multiple NVIDIA GPUs, in either PCIe or SXM embedded form factor. The major benefit of PCIe cards is that as a well-established system architecture over decades, such systems are extremely cost-effective and very easy to upgrade. In contrast, SXM modules are not upgradable, but support more advanced GPUs and larger VRAM capacities.

The downside of both these form factors is that the CPU and GPU have separate pools of memory, so NVIDIA has developed a superchip approach combining an Arm-based (non-x86) CPU with two embedded GPUs on a single module. This innovative architecture enables higher density servers and high-performance clusters, taking advantage of many CPUs and GPUs a in a single stack, with coherent memory shared across the processors.

Click the tabs below to explore these options, or contact our AI team for more information or advice.

3XS RTX PRO Servers

3XS RTX PRO Servers are based on NVIDIA-certified designs. Powered by NVIDIA RTX PRO Blackwell Server GPUs, they provide best-in-class performance across a wide range of workloads including visualisation and rendering, scientific computing and HPC, and small-scale AI. Designed for datacentre environments, 3XS RTX PRO Servers are fine-tuned by our hardware engineers and workload specialists for maximum performance and reliability.

NVIDIA Elite Partner

NVIDIA Elite Partner

Scan has been an accredited NVIDIA Elite Partner since 2017, awarded for our expertise in the areas of deep learning and AI.

AI Optimised

AI Optimised

Our in-house team includes data scientists who optimise the configuration and software stack of each system for AI workloads.

Trusted by you

Trusted by you

Scan AI solutions are trusted by thousands of organisations for their AI training needs.

7 Days Support

7 Days Support

Our technical support engineers are available seven days a week to help with any queries.

3 Years Warranty

3 Years Warranty

3XS Systems include a three-year warranty, so if anything goes faulty we’ll repair or replace it.

Architecture

3XS RTX PRO servers support up to eight NVIDIA RTX PRO Blackwell Server GPUs - offering up to 768GB VRAM, alongside NVIDIA ConnectX-7 SmartNICs. An updated version is expected shortly featuring embedded ConnectX-8 SuperNICs acting as HCAs and PCIe switches. This powerful combination provides best-in-class performance of up to 32 PFLOPS of FP4 for AI with up to 1.2 trillion model parameters, and 3 PFLOPS of ray tracing in visualisation applications. Additionally, using multi-instance GPU (MIG), each GPU can be shared with up to four users, fully isolated at the hardware level.

A key selling point of RTX PRO Servers is that they are fully configurable, with a wide range of CPUs, RAM, networking, storage and software.

GPU Specifications
Name RTX PRO 6000 Blackwell Server
Architecture Blackwell
Bus PCIe 5
GPU GB202
CUDA Cores 24,064
Tensor Cores 752 (5th gen)
RT Cores 188 (4th gen)
Memory 96GB GDDR7
ECC Memory
Memory Controller 512-bit
MIG Instances 4
Confidential Computing Supported
TDP 600W
Thermal Passive

Relative Performance & Capability

A 3XS RTX PRO server configured with the maximum eight RTX PRO 6000 Blackwell Server Edition GPUs has an FP4 performance of 32 PFLOPS (quadrillion floating-point operations per second), which is adequate for many SLMs and LLMs with a specific focus.

It’s also worth noting that unlike most other AI hardware, RTX PRO Servers are also an ideal platform for demanding visualisation workloads such as rendering and digital twins. This is because they include dedicated RT cores that accelerate raytracing. In addition, they support both Linux and Windows Server operating systems, the latter being a crucial requirement for many visualisation applications.

The dual nature of RTX PRO Servers is a big benefit for businesses, because it means that rather than investing in dedicated AI and visualisation servers you can switch between the two workloads at will.

System 3XS RTX PRO Server with RTX PRO 6000 Blackwell Server 3XS EGX Server with L40S / RTX 6000 Ada 3XS MGX Server with RTX PRO 6000 Blackwell Server 3XS MGX Server with H200 NVL 3XS HGX Server with H200 3XS HGX Server with B200 3XS HGX Server with B300 NVIDIA DGX H200 NVIDIA DGX B200 NVIDIA DGX B300 NVIDIA DGX GB200 NVL72 cluster NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4) 4 PFLOPS 1.4 PFLOPS* 4 PFLOPS 3.3 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 20 PFLOPS 20 PFLOPS
Memory per GPU 96GB 48GB 96GB 141GB 141GB 180GB 288GB 141GB 180GB 288GB 372GB 583GB
GPU(s) Up to 8 Up to 8 Up to 8 Up to 8 4 or 8 4 or 8 4 or 8 8 8 8 72 72
Max AI performance (FP4) Up to 32 PFLOPS Up to 11.2 PLOPS* Up to 32 PFLOPS Up to 26.4 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS 1,400 PFLOPS 1,400 PFLOPS
Max GPU Memory 768GB 384GB 768GB 1.1TB 1.1TB 1.44TB 2.3TB 1.1TB 1.44TB 2.3TB 13.4TB 21TB
Maximum AI model size (FP4)** 1.2 trillion 0.6 trillion 1.2 trillion 1.7 trillion 1.7 trillion 2.2 trillion 3.6 trillion 1.7 trillion 2.4 trillion 3.6 trillion 48 trillion 65 trillion
Cost ££ £ ££ £££ ££££ £££££ ££££££ ££££ £££££ ££££££ £££££££££ ££££££££££

*Performance is FP8 as Ada Lovelace / Hopper GPUs do not support FP4 **Not necessarily representing a single model - may be multiple large models at cluster scale

Scaling Up

When it comes to scaling 3XS RTX PRO Servers, multiple servers can be connected into a POD architecture configured with shared AI-optimised PEAK:AIO software-defined storage and NVIDIA networking - either Ethernet or InfiniBand.

To aid management of the GPUs, NVIDIA Run:ai software enables intelligent resource management and consumption so that users can easily access GPU fractions, multiple GPUs or clusters of servers for workloads of every size and stage of the AI lifecycle. This ensures that all available compute can be utilised and GPUs never have to sit idle. Run:ai's scheduler is a simple plug-in for Kubernetes clusters and adds high-performance orchestration to your containerised AI workloads.

As an NVIDIA Elite Partner, our expert AI team can help you design and deploy 3XS RTX PRO Servers at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The 3XS RTX PRO Servers are the most cost-effective platform to train AI models providing 768GB of VRAM in a single chassis, with MIG support too if it is intended to be a shared resource. RTX PRO Servers are also ideal for running visualisation workloads too. The scaling options make this a cost-effective platform to scale too, although software like NVIDIA Run:ai would be recommended to get the most from the combined GPUs. For more demanding projects you should consider an or systems thanks to their superior performance and scalability.

Ready to buy?

Click the link below to view the range of AI training solutions. If you still have questions on how to select the perfect system, don't hesitate to contact one of our friendly advisors on 01204 474210 or at [email protected].

CONFIGURE 3XS RTX PRO SERVERS

3XS EGX & MGX Servers

3XS EGX and MGX servers are based on NVIDIA-certified designs. Powered by NVIDIA professional PCIe GPUs, they provide data scientists and researchers with a flexible, cost-effective platform for training AI models. Designed for datacentre environments, 3XS EGX and MGX servers are fine-tuned by our hardware engineers and workload specialists, and supplied with a custom Linux Ubuntu-based software stack for maximum performance and reliability.

NVIDIA Elite Partner

NVIDIA Elite Partner

Scan has been an accredited NVIDIA Elite Partner since 2017, awarded for our expertise in the areas of deep learning and AI.

AI Optimised

AI Optimised

Our in-house team includes data scientists who optimise the configuration and software stack of each system for AI workloads.

Trusted by you

Trusted by you

Scan AI solutions are trusted by thousands of organisations for their AI training needs.

7 Days Support

7 Days Support

Our technical support engineers are available seven days a week to help with any queries.

3 Years Warranty

3 Years Warranty

3XS Systems include a three-year warranty, so if anything goes faulty we’ll repair or replace it.

Architecture

3XS EGX and MGX servers support up to eight NVIDIA GPUs - offering up to 1.1TB VRAM, alongside various NVIDIA ConnectX networking options. This powerful combination provides high performance of up to up to 32 PFLOPS FP4 and models with up to 1.2 trillion parameters. Additionally, some GPUs support multi-instance GPU (MIG), where each GPU can be shared with up to seven users, fully isolated at the hardware level.

A key selling point of EGX and MGX servers is that they are fully configurable, with a wide range of CPUs, RAM, networking, storage and software.

GPU Specifications
Name RTX PRO 6000 Blackwell Server H200 NVL L40S RTX 6000 ADA
Architecture Blackwell Hopper Ada Lovelace Ada Lovelace
Bus PCIe 5 PCIe 5 PCIe 4 PCIe 4
GPU GB202 H200 AD102 AD102
CUDA Cores 24,064 16,896 18,176 18,176
Tensor Cores 752 (5th gen) 528 (4th gen) 568 (4th gen) 568 (4th gen)
RT Cores 188 (4th gen) 142 (3rd gen) 142 (3rd gen)
Memory 96GB GDDR7 141GB HBM3 48GB GDDR6 48GB GDDR6
ECC Memory
Memory Controller 512-bit 5,120-bit 384-bit 384-bit
MIG Instances 4 7
Confidential Computing Supported
TDP 600W 600W 350W 300W
Thermal Passive Passive Passive Active

Relative Performance & Capability

A 3XS EGX or MGX server configured with the maximum eight RTX PRO 6000 Blackwell Server Edition GPUs offers a maximum performance of 32 PetaFLOPS (quadrillion floating-point operations per second), which as you can see from the comparison table below, is one of the faster non-DGX servers on the market. Alternatively, choosing eight NVIDIA H200 NVL GPUs delivers slightly less performance, due to the older Hopper architecture, but with almost 50% greater memory capacity using the faster HBM3 format. The two Ada Lovelace generation GPUs, the L40S and RTX 6000 ADA offer a lower price point, but commensurably lower performance.

System 3XS RTX PRO Server with RTX PRO 6000 Blackwell Server 3XS EGX Server with L40S / RTX 6000 Ada 3XS MGX Server with RTX PRO 6000 Blackwell Server 3XS MGX Server with H200 NVL 3XS HGX Server with H200 3XS HGX Server with B200 3XS HGX Server with B300 NVIDIA DGX H200 NVIDIA DGX B200 NVIDIA DGX B300 NVIDIA DGX GB200 NVL72 cluster NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4) 4 PFLOPS 1.4 PFLOPS* 4 PFLOPS 3.3 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 20 PFLOPS 20 PFLOPS
Memory per GPU 96GB 48GB 96GB 141GB 141GB 180GB 288GB 141GB 180GB 288GB 372GB 583GB
GPU(s) Up to 8 Up to 8 Up to 8 Up to 8 4 or 8 4 or 8 4 or 8 8 8 8 72 72
Max AI performance (FP4) Up to 32 PFLOPS Up to 11.2 PLOPS* Up to 32 PFLOPS Up to 26.4 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS 1,400 PFLOPS 1,400 PFLOPS
Max GPU Memory 768GB 384GB 768GB 1.1TB 1.1TB 1.44TB 2.3TB 1.1TB 1.44TB 2.3TB 13.4TB 21TB
Maximum AI model size (FP4)** 1.2 trillion 0.6 trillion 1.2 trillion 1.7 trillion 1.7 trillion 2.2 trillion 3.6 trillion 1.7 trillion 2.4 trillion 3.6 trillion 48 trillion 65 trillion
Cost ££ £ ££ £££ ££££ £££££ ££££££ ££££ £££££ ££££££ £££££££££ ££££££££££

*Performance is FP8 as Ada Lovelace / Hopper GPUs do not support FP4 **Not necessarily representing a single model - may be multiple large models at cluster scale

Scaling Up

When it comes to scaling 3XS EGX and MGX servers, multiple servers can be connected into a POD architecture configured with shared AI-optimised PEAK:AIO software-defined storage and NVIDIA networking - either Ethernet or InfiniBand.

To aid management of the GPUs, NVIDIA Run:ai software enables intelligent resource management and consumption so that users can easily access GPU fractions, multiple GPUs or clusters of servers for workloads of every size and stage of the AI lifecycle. This ensures that all available compute can be utilised and GPUs never have to sit idle. Run:ai's scheduler is a simple plug-in for Kubernetes clusters and adds high-performance orchestration to your containerised AI workloads.

As an NVIDIA Elite Partner, our expert AI team can help you design and deploy 3XS EGX and MGX servers at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The 3XS range of EGX and MGX servers offer a cost-effective way to train AI models providing up to 1.1TB of VRAM in a single chassis, with MIG support too if it is intended to be a shared resource. The scaling options make these an attractive platform to build up too, although software such as NVIDIA Run:ai would be recommended to get the most from the combined GPUs. For projects likely to scale faster you should consider a or an based platform as the inbuilt Base Command management software and NVIDIA AI Enterprise software stack offer significant benefits.

Ready to buy?

Click the link below to view the range of AI training solutions. If you still have questions on how to select the perfect system, don't hesitate to contact one of our friendly advisors on 01204 474210 or at [email protected].

CONFIGURE 3XS EGX / MGX SERVERS

3XS HGX Servers

Powered by either four or eight NVIDIA SXM GPUs, HGX servers provide data scientists and researchers with a powerful platform for scaling-out AI models. Designed for datacentre environments, 3XS HGX Servers are fine-tuned by our hardware engineers and workload specialists, and supplied with a custom Linux Ubuntu-based software stack for maximum performance and reliability.

NVIDIA Elite Partner

NVIDIA Elite Partner

Scan has been an accredited NVIDIA Elite Partner since 2017, awarded for our expertise in the areas of deep learning and AI.

AI Optimised

AI Optimised

Our in-house team includes data scientists who optimise the configuration and software stack of each system for AI workloads.

Trusted by you

Trusted by you

Scan AI solutions are trusted by thousands of organisations for their AI training needs.

7 Days Support

7 Days Support

Our technical support engineers are available seven days a week to help with any queries.

3 Years Warranty

3 Years Warranty

3XS Systems include a three-year warranty, so if anything goes faulty we’ll repair or replace it.

Architecture

3XS HGX servers support up to eight NVIDIA SXM-format GPUs - offering up to 2.3TB HBM3e VRAM, alongside various NVIDIA ConnectX-7 and -8 networking options. This powerful combination provides ground breaking performance of up to 144 PFLOPS FP4. The GPUs are connected via NVLink and NVSwitch technologies for the fastest data processing. Additionally, these GPUs support multi-instance GPU (MIG), where each GPU can be shared with up to seven users, fully isolated at the hardware level.

A key selling point of HGX servers is that they are highly configurable, with a wide range of CPUs, RAM, networking, storage and software stack.

HGX B300 HGX B200 HGX H200
GPUs 8x NVIDIA B300 8x NVIDIA B200 8x NVIDIA H200
FP4 TENSOR CORE** 144 PFLOPS | 72 PFLOPS 144 PFLOPS | 72 PFLOPS N/A
FP8/FP6 TENSOR CORE* 72 PFLOPS 72 PFLOPS 32 PFLOPS
INT8 TENSOR CORE* 72 POPS 72 POPS 32 POPS
FP16/BF16 TENSOR CORE* 36 PFLOPS 36 PFLOPS 16 PFLOPS
TF32 TENSOR CORE* 18 PFLOPS 18 PFLOPS 8 PFLOPS
FP32 600 TFLOPS 600 TFLOPS 540 TFLOPS
FP64/FP64 TENSOR CORE* 296 TFLOPS 296 TFLOPS 540 TFLOPS
ARCHITECTURE Blackwell Ultra Blackwell Hopper
CUDA CORES TBC TBC 8x 16,896
TENSOR CORES TBC TBC 8x 528 4th gen
TOTAL GPU MEMORY Up to 2.3TB HBM3e 1.4TB HBM3e 1.1TB HBM3
MEMORY CONTROLLER TBC TBC 5,120-bit
NVLINK 5th gen 5th gen 4th gen
NVSWITCH NVLink 5 Switch NVLink 5 Switch NVLink 4 Switch
NVSWITCH GPU-TO-GPU BANDWIDTH 1.8 TB/s 1.8 TB/s 900 GB/s
NVLINK BANDWIDTH 14.4 TB/s 14.4 TB/s 7.2 TB/s

Relative Performance & Capability

A 3XS HGX server configured with the maximum eight SXM B300 Blackwell Ultra GPUs offers a maximum performance of 144 PetaFLOPS (quadrillion floating-point operations per second), which as you can see from the comparison table below, is best-in-class for a single server. The B200 Blackwell variant offers the same performance but with less memory capacity, whilst the older Hopper version has less memory capacity and lower performance, albeit at a lower price point. Although all HGX servers are extremely capable AI training servers, the addition of FP4 capability to Blackwell-based GPUs and their larger memory does set them apart if you’re looking at working with the latest LLMs and VLMs.

System 3XS RTX PRO Server with RTX PRO 6000 Blackwell Server 3XS EGX Server with L40S / RTX 6000 Ada 3XS MGX Server with RTX PRO 6000 Blackwell Server 3XS MGX Server with H200 NVL 3XS HGX Server with H200 3XS HGX Server with B200 3XS HGX Server with B300 NVIDIA DGX H200 NVIDIA DGX B200 NVIDIA DGX B300 NVIDIA DGX GB200 NVL72 cluster NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4) 4 PFLOPS 1.4 PFLOPS* 4 PFLOPS 3.3 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 20 PFLOPS 20 PFLOPS
Memory per GPU 96GB 48GB 96GB 141GB 141GB 180GB 288GB 141GB 180GB 288GB 372GB 583GB
GPU(s) Up to 8 Up to 8 Up to 8 Up to 8 4 or 8 4 or 8 4 or 8 8 8 8 72 72
Max AI performance (FP4) Up to 32 PFLOPS Up to 11.2 PLOPS* Up to 32 PFLOPS Up to 26.4 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS 1,400 PFLOPS 1,400 PFLOPS
Max GPU Memory 768GB 384GB 768GB 1.1TB 1.1TB 1.44TB 2.3TB 1.1TB 1.44TB 2.3TB 13.4TB 21TB
Maximum AI model size (FP4)** 1.2 trillion 0.6 trillion 1.2 trillion 1.7 trillion 1.7 trillion 2.2 trillion 3.6 trillion 1.7 trillion 2.4 trillion 3.6 trillion 48 trillion 65 trillion
Cost ££ £ ££ £££ ££££ £££££ ££££££ ££££ £££££ ££££££ £££££££££ ££££££££££

Scaling Up

When it comes to scaling 3XS HGX servers, multiple servers can be connected into a POD architecture configured with shared AI-optimised PEAK:AIO software-defined storage and NVIDIA networking - either Ethernet or InfiniBand.

To aid management of the GPUs, NVIDIA Run:ai software enables intelligent resource management and consumption so that users can easily access GPU fractions, multiple GPUs or clusters of servers for workloads of every size and stage of the AI lifecycle. This ensures that all available compute can be utilised and GPUs never have to sit idle. Run:ai's scheduler is a simple plug-in for Kubernetes clusters and adds high-performance orchestration to your containerised AI workloads.

As an NVIDIA Elite Partner, our expert AI team can help you design and deploy 3XS HGX servers at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The 3XS HGX servers offers powerful systems for training the large AI models at scale, with MIG support if it is intended to be a shared resource. HGX servers provide the same performance as DGX appliances, but can be configured with different CPU, RAM, networking and storage options. If scaling up, although you can add NVIDIA AI Enterprise to HGX servers, you should consider as NVAIE is included alongside the inbuilt Base Command management software and enterprise-level support direct from NVIDIA. If you don’t intend to scale to the heights of LLMs and VLMs, then the platforms may be sufficient as they scale well too.

Ready to buy?

Click the links below to view the range of AI training solutions. If you still have questions on how to select the perfect system, don't hesitate to contact one of our friendly advisors on 01204 474210 or at [email protected].

CONFIGURE 3XS HGX SERVERS

NVIDIA DGX Appliances

Powered by eight NVIDIA Blackwell Ultra, Blackwell or Hopper SXM GPUs, NVIDIA DGX appliances provides data scientists and researchers with the most powerful platform for scaling-out AI models. Designed for datacentre environments, the NVIDIA DGX range is the ultimate in AI-optimised appliances, being supplied with a complete software stack and management interface, they are supported directly from NVIDIA for the maximum uptime and reliability.

Architecture

NVIDIA DGX appliances feature eight SXM GPUs - offering up to 2.3TB HBM3e VRAM, alongside various NVIDIA ConnectX-7 and -8 networking. This powerful combination provides ground-breaking performance of up to up to 144 PFLOPS FP4. The GPUs are connected via NVLink and NVSwitch technologies for the fastest data processing. Additionally, these GPUs support multi-instance GPU (MIG), where each GPU can be shared with up to seven users, fully isolated at the hardware level.

Unlike the other types of server covered in this guide DGX appliances are not configurable, they have a pre-defined configuration of CPUs, RAM, networking, storage and software.

DGX B300 DGX B200 DGX H200
GPUs 8x NVIDIA B300 8x NVIDIA B200 8x NVIDIA H200
FP4 TENSOR CORE** 144 PFLOPS | 72 PFLOPS 144 PFLOPS | 72 PFLOPS N/A
FP8/FP6 TENSOR CORE* 72 PFLOPS 72 PFLOPS 32 PFLOPS
INT8 TENSOR CORE* 72 POPS 72 POPS 32 POPS
FP16/BF16 TENSOR CORE* 36 PFLOPS 36 PFLOPS 16 PFLOPS
TF32 TENSOR CORE* 18 PFLOPS 18 PFLOPS 8 PFLOPS
FP32 600 TFLOPS 600 TFLOPS 540 TFLOPS
FP64/FP64 TENSOR CORE* 296 TFLOPS 296 TFLOPS 540 TFLOPS
ARCHITECTURE Blackwell Ultra Blackwell Hopper
CUDA CORES TBC TBC 8x 16,896
TENSOR CORES TBC TBC 8x 528 4th gen
TOTAL GPU MEMORY Up to 2.3TB HBM3e 1.4TB HBM3e 1.1TB HBM3
MEMORY CONTROLLER TBC TBC 5,120-bit
NVLINK 5th gen 5th gen 4th gen
NVSWITCH NVLink 5 Switch NVLink 5 Switch NVLink 4 Switch
NVSWITCH GPU-TO-GPU BANDWIDTH 1.8 TB/s 1.8 TB/s 900 GB/s
NVLINK BANDWIDTH 14.4 TB/s 14.4 TB/s 7.2 TB/s

Relative Performance & Capability

An NVIDIA DGX B300 appliance offers a maximum performance of 144 PetaFLOPS (quadrillion floating-point operations per second), which as you can see from the comparison table below, is best-in-class for a single server. The DGX B200 variant offers the same performance but with less memory capacity, whilst the older DGX H200 has less capacity again and performance suffers further. Although all three DGX’s are extremely capable AI training servers, the addition of FP4 capability to Blackwell-based GPUs and their larger memory does set them apart if you’re looking at working with the latest LLMs and VLMs.

System 3XS RTX PRO Server with RTX PRO 6000 Blackwell Server 3XS EGX Server with L40S / RTX 6000 Ada 3XS MGX Server with RTX PRO 6000 Blackwell Server 3XS MGX Server with H200 NVL 3XS HGX Server with H200 3XS HGX Server with B200 3XS HGX Server with B300 NVIDIA DGX H200 NVIDIA DGX B200 NVIDIA DGX B300 NVIDIA DGX GB200 NVL72 cluster NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4) 4 PFLOPS 1.4 PFLOPS* 4 PFLOPS 3.3 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 20 PFLOPS 20 PFLOPS
Memory per GPU 96GB 48GB 96GB 141GB 141GB 180GB 288GB 141GB 180GB 288GB 372GB 583GB
GPU(s) Up to 8 Up to 8 Up to 8 Up to 8 4 or 8 4 or 8 4 or 8 8 8 8 72 72
Max AI performance (FP4) Up to 32 PFLOPS Up to 11.2 PLOPS* Up to 32 PFLOPS Up to 26.4 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS 1,400 PFLOPS 1,400 PFLOPS
Max GPU Memory 768GB 384GB 768GB 1.1TB 1.1TB 1.44TB 2.3TB 1.1TB 1.44TB 2.3TB 13.4TB 21TB
Maximum AI model size (FP4)** 1.2 trillion 0.6 trillion 1.2 trillion 1.7 trillion 1.7 trillion 2.2 trillion 3.6 trillion 1.7 trillion 2.4 trillion 3.6 trillion 48 trillion 65 trillion
Cost ££ £ ££ £££ ££££ £££££ ££££££ ££££ £££££ ££££££ £££££££££ ££££££££££

Scaling Up

When it comes to scaling NVIDIA DGX, multiple appliances can be connected into hugely-powerful BasePOD or SuperPOD architectures of up to 140 nodes.

The DGX systems are managed and controlled by NVIDIA Base Command software, however SuperPODs are overseen by NVIDIA Unified Fabric Manager (UFM). This simplifies datacentre networking management by combining enhanced, real-time network telemetry with AI-powered cyber intelligence and analytics to support scale-out InfiniBand clusters.

BasePOD Architecture

SuperPOD Architecture

As an NVIDIA Elite Partner, and the UK’s only NVIDIA-certified DGX Managed Services Provider, our expert AI team can help you design and deploy DGX solutions at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The NVIDIA DGX range of appliances offers powerful and complete platforms for training the largest AI models at scale, with MIG support if it is intended to be a shared resource. DGX offers the same performance as HGX, but the fixed configuration ensures the whole system is optimised for best performance. Additionally, the NVIDIA-certified scaling options deliver predictable performance as your POD architectures grow. If you don’t need NVIDIA AI Enterprise software out of the box, you could consider the a , or if you don’t intend to scale to the heights of LLMs and VLMs, then the platforms may be sufficient as they scale nicely too.

Ready to buy?

Click the links below to view the range of AI training solutions. If you still have questions on how to select the perfect system, don't hesitate to contact one of our friendly advisors on 01204 474210 or at [email protected].

VIEW NVIDIA DGX APPLIANCES

NVIDIA DGX Clusters

The NVIDIA GB200 and GB300 NVL72 rack architectures are designed to handle terabyte-class models for massive recommender systems, agentic and physical AI, and for AI factory deployments. Unlike other NVIDIA DGX PODs based on DGX AI appliances, these are based on multiple NVIDIA Grace-Blackwell Superchip powered servers that provide a vast shared memory capacity across multiple nodes, enabling many trillion-parameter AI models. Each liquid-cooled rack features 36 NVIDIA Grace-Blackwell Superchips providing 36 NVIDIA Grace CPUs and 72 Blackwell or Blackwell Ultra GPUs - operating together via 5th gen NVIDIA NVLink.

Architecture

The Grace Blackwell architecture brings together the groundbreaking performance of two NVIDIA Blackwell GPUs with an NVIDIA Grace CPU in a single superchip, connected with NVIDIA NVLink Chip-2-Chip (C2C), a high-bandwidth, low-latency, memory-coherent interconnect. Each NVIDIA Grace Blackwell Superchip has up to 500GB of LPDDR5 CPU memory and 583GB of HBM3e GPU memory. NVIDIA NVSwitch and networking system enables every CPU and GPU to access each other’s memory at an astonishing 1.8TB/s.

Like DGX appliances, the nodes that make up DGX clusters are not configurable, they have a pre-defined configuration of CPUs, RAM, networking, storage and software.

Specification NVIDIA GB200 NVL72 NVIDIA GB300 NVL72
Configuration 36 Grace CPU and 72 Blackwell GPUs 36 Grace CPU and 72 Blackwell Ultra GPUs
FP4 Tensor Core 1,440 PFLOPS 1,440 PFLOPS
FP8/FP6 Tensor Core 720 PFLOPS 720 PFLOPS
INT8 Tensor Core 720 POPS 720 POPS
FP16/BF16 Tensor Core 360 PFLOPS 360 PFLOPS
TF32 Tensor Core 180 PFLOPS 180 PFLOPS
FP32 5,760 TFLOPS 5,760 TFLOPS
FP64 2,880 TFLOPS 2,880 TFLOPS
FP64 Tensor Core 2,880 TFLOPS 2,880 TFLOPS
GPU Memory | Bandwidth Up to 13.4TB HBM3e | 576 TB/s Up to 21TB HBM3e | 576 TB/s
NVLink Bandwidth 130 TB/s 130 TB/s
CPU Core Count 2,592 Arm Neoverse V2 cores 2,592 Arm Neoverse V2 cores
CPU Memory | Bandwidth Up to 17TB LPDDR5X | Up to 18.4 TB/s Up to 18TB SOCAMM with LPDDR5X | Up to 14.3 TB/s

Relative Performance & Capability

The two DGX clusters differ only in the GB200 or GB300 superchip, the latter offering greater overall memory capacity suiting AI factory-scale workloads. Additionally, further performance is derived from the unified memory architecture, taking the useable memory capacity to 30TB on the GB200 NVL72 and almost 40TB on the GB300 NVL72.

System 3XS RTX PRO Server with RTX PRO 6000 Blackwell Server 3XS EGX Server with L40S / RTX 6000 Ada 3XS MGX Server with RTX PRO 6000 Blackwell Server 3XS MGX Server with H200 NVL 3XS HGX Server with H200 3XS HGX Server with B200 3XS HGX Server with B300 NVIDIA DGX H200 NVIDIA DGX B200 NVIDIA DGX B300 NVIDIA DGX GB200 NVL72 cluster NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4) 4 PFLOPS 1.4 PFLOPS* 4 PFLOPS 3.3 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 3.9 PFLOPS* 18 PFLOPS 18 PFLOPS 20 PFLOPS 20 PFLOPS
Memory per GPU 96GB 48GB 96GB 141GB 141GB 180GB 288GB 141GB 180GB 288GB 372GB 583GB
GPU(s) Up to 8 Up to 8 Up to 8 Up to 8 4 or 8 4 or 8 4 or 8 8 8 8 72 72
Max AI performance (FP4) Up to 32 PFLOPS Up to 11.2 PLOPS* Up to 32 PFLOPS Up to 26.4 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS Up to 31.2 PFLOPS* 144 PFLOPS 144 PFLOPS 1,400 PFLOPS 1,400 PFLOPS
Max GPU Memory 768GB 384GB 768GB 1.1TB 1.1TB 1.44TB 2.3TB 1.1TB 1.44TB 2.3TB 13.4TB 21TB
Maximum AI model size (FP4)** 1.2 trillion 0.6 trillion 1.2 trillion 1.7 trillion 1.7 trillion 2.2 trillion 3.6 trillion 1.7 trillion 2.4 trillion 3.6 trillion 48 trillion 65 trillion
Cost ££ £ ££ £££ ££££ £££££ ££££££ ££££ £££££ ££££££ £££££££££ ££££££££££

Scaling Up

Although the integrated rack architecture of the GB200 NVL72 and GB300 NVL72 is already AI computing at a considerable scale, they are intended to be deployed in multiple rack configurations of up to eight units enabling up to 576 fully-connected GPUs and a combined performance of 1.4 ExaFLOPS.

As an NVIDIA Elite Partner, and the UK’s only NVIDIA-certified DGX Managed Services Provider, our expert AI team can help you design and deploy DGX solutions at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The NVIDIA DGX range of Grace-Blackwell superchip clusters is the pinnacle of AI training - if you need to process AI workloads at this scale, there really is no there alternative. The unified architecture is such that even the DGX-based SuperPOD solutions would struggle to match it, as the memory options in standard DGX appliances is neither as large nor unified.

Ready to buy?

Contact our AI team to enquire about NVIDIA DGX Clusters on 01204 474210 or at [email protected].

Proof of Concept

Any of these solutions can be trialed free-of-charge in our proof-of-concept hosted environment. Your trial will involve secure access where you can use a sample of your own data for most realistic insights, and you’ll be guided by our expert data scientists to ensure you get the most from your PoC.

To arrange your PoC, contact our AI team.

Get Cloud Computing in 3 Simple Steps
Icon 1. Choose your GPU

Browse our available options or connect with a specialist to discuss a bespoke solution.

Icon 2. Rapid Provisioning

We’ll provision your environment and take you through a guided onboarding process.

Icon 3. Enjoy SCAN Cloud

You’re online! Set up, get to work, and access support anytime.

AI Training in the Cloud

All the AI training servers and appliances covered in this guide can also be provisioned on our Scan Cloud platform. Our cloud server solutions can be configured with a wide variety of NVIDIA GPUs or an entire DGX appliance can be selected.

Simple, Flexible Pricing with No Hidden Fees

No long-term commitments, no extra charges for storage or networking—experience the full benefits of the cloud without the drawbacks.

Reference Architecture for Unrivaled Performance

Harness the power of the latest GPUs for desktops or high-performance GPU clusters, from single GPUs to 8-way systems.

Networking and Storage Built for Performance

All GPU instances include NVMe storage and uncontended network ports, ensuring top-tier performance and data privacy.

Build It Your Way

Custom builds are SCAN’s specialty—every aspect of our Infrastructure-as-a-Service (IaaS) solutions is fully customizable, with UK-based solutions ensuring data sovereignty.

Browse the available Scan Cloud options on or contact our Cloud team.