AI Training Hardware | Buyers Guide | SCAN UK

AI Training Hardware Buyers Guide

Once a development pipeline has been established and an AI model is ready for production, the next step is to train the AI model. The training phase requires far more GPU and storage resource than during development, as many iterations will be needed. Training is therefore the most expensive part of any AI project. Training requires at minimum a multi-GPU server, supported by fast storage and connected via high-throughput, low-latency networking. If these three component parts of your AI infrastructure aren’t matched or optimised, productivity and efficiency will be impacted - the fastest GPU-accelerated super computer will be a waste of money if connected to storage that cannot keep its GPUs 100% utilised.

This guide takes you through the myriad of AI training hardware options, explaining their suitability for different projects and their ability to scale - including small language models (SLMs), large language models (LLMs), vision language models (VLMs), generative AI, agentic AI and physical AI models. It’s also worth noting that many of these servers are suitable for inferencing large AI models - check out our AI Inferencing Hardware Buyers Guide for more information.

Your AI Project

The training of an AI model is always preceded by the project scoping, data preparation and AI development phases, as illustrated below.

Problem Statement

Project scope and high level ROI

Data Preparation

Classification, cleaning and structure

Model Development

Education and resource allocation

Model Training

Optimisation and scaling

Model Integration

Inferencing and deployment

Governance

Maintenance and compliance

It may not be immediately obvious, but ensuring your project scope is realistic and achievable has a large impact on what AI training hardware you’ll ultimately require. These early stages are crucial to get right - you can learn more by reading our AI Project Planning Guide and our AI Development Hardware Buyers Guide.

AI Model Training

Your AI model development phase will likely have involved using frameworks or optimised foundation models (FMs) to save time and effort when building your AI pipeline - all available in the NVIDIA AI Enterprise (NVAIE) software platform. This is optimised for NVIDIA GPUs and included in DGX Spark and DGX Station GB300 development platforms; and the DGX Servers discussed further down this guide. NVAIE is also available on subscription for the non-DGX servers too.

NVAIE provides end-to-end implementation of AI projects such as medical imaging, autonomous vehicles, avatars, drug discovery, generative AI, agentic AI, physical and robotics; scaling across development, training and inferencing platforms seamlessly.

The frameworks or FMs used in development will have a significant impact on initial model size when you get to the training phase. If you started with a single GPU workstation, then training may be possible on a single multi-GPU server, however if your AI model was developed on multi-GPU workstations then chances are that multiple multi-GPU servers will be required to train it. It is therefore key to understand the relative size of models, how they might scale and the GPU hardware that will be capable of handling their training. Just for context, between the 1950s and to 2018, AI model size grew by seven orders of magnitude (from 000’s to 30M) - from 2018 to 2022 alone, it has grown another four orders of magnitude (from 30M to 20B). Today 400B parameter models are not uncommon and ChatGPT-4 is rumoured to have 1.8T.

Type of AI Model	Typical Use Case	Parameters	Initial Dataset Size	Typical Server(s) Required
SLMs	Limited scope chatbots / NLP / On-device applications	100M - 7B	1-8GB	Single, multi-GPU PCIe server
LLMs	Content creation / advanced chatbots / Real-time translation	7B - 20B	10-15GB	Single, multi-GPU SXM server
Generative AI	Advanced content creation / Personalised experiences / Drug Discovery	20B - 70B	20-40GB	Multiple, multi-GPU SXM servers
Agentic AI / VLMs	Personalised interactions / Data-driven insights / Autonomous vehicles	70B - 200B	60-150GB	GPU server cluster
Physical AI	Robotics / Digital Twins	200B - xT	200-750GB	GPU server cluster

It is worth clarifying that this table is for guidance only - absolute size (in GB) of any model is determined by the number of parameters and the size of each parameter. Similarly, there will be small agentic AI models if their function is very focused and there will be very large LLMs if translation of many languages is the goal. It is also worth pointing out that the dataset size mentioned is the likely final size, so you need to consider capacity for numerous versions and many iterations before reaching the final model.

AI Hardware

The following AI training hardware options are compared and discussed in light of the model sizes above, offering recommendations and advice about the most suitable option for various scenarios. However, as previously stated, thorough planning and scoping phases will lead to much more accurate provisional model sizes and better insight into development hardware and ultimately, training hardware choice.

Traditional AI servers are built around the industry-norm of one or more x86 CPUs (either AMD or Intel) plus multiple NVIDIA GPUs, in either PCIe or SXM embedded form factor. The major benefit of PCIe cards is that as a well-established system architecture over decades, such systems are extremely cost-effective and very easy to upgrade. In contrast, SXM modules are not upgradable, but support more advanced GPUs and larger VRAM capacities.

The downside of both these form factors is that the CPU and GPU have separate pools of memory, so NVIDIA has developed a superchip approach combining an Arm-based (non-x86) CPU with two embedded GPUs on a single module. This innovative architecture enables higher density servers and high-performance clusters, taking advantage of many CPUs and GPUs a in a single stack, with coherent memory shared across the processors.

Click the tabs below to explore these options, or contact our AI team for more information or advice.

3XS RTX PRO Servers

3XS RTX PRO Servers are based on NVIDIA-certified designs. Powered by NVIDIA RTX PRO Blackwell Server GPUs, they provide best-in-class performance across a wide range of workloads including visualisation and rendering, scientific computing and HPC, and small-scale AI. Designed for datacentre environments, 3XS RTX PRO Servers are fine-tuned by our hardware engineers and workload specialists for maximum performance and reliability.

NVIDIA Elite Partner

Scan has been an accredited NVIDIA Elite Partner since 2017, awarded for our expertise in the areas of deep learning and AI.

AI Optimised

Our in-house team includes data scientists who optimise the configuration and software stack of each system for AI workloads.

Trusted by you

Scan AI solutions are trusted by thousands of organisations for their AI training needs.

7 Days Support

Our technical support engineers are available seven days a week to help with any queries.

3 Years Warranty

3XS Systems include a three-year warranty, so if anything goes faulty we’ll repair or replace it.

Architecture

3XS RTX PRO servers support up to eight NVIDIA RTX PRO Blackwell Server GPUs - offering up to 768GB VRAM, alongside NVIDIA ConnectX-7 SmartNICs. An updated version is expected shortly featuring embedded ConnectX-8 SuperNICs acting as HCAs and PCIe switches. This powerful combination provides best-in-class performance of up to 32 PFLOPS of FP4 for AI with up to 1.2 trillion model parameters, and 3 PFLOPS of ray tracing in visualisation applications. Additionally, using multi-instance GPU (MIG), each GPU can be shared with up to four users, fully isolated at the hardware level.

A key selling point of RTX PRO Servers is that they are fully configurable, with a wide range of CPUs, RAM, networking, storage and software.

GPU Specifications
Name	RTX PRO 6000 Blackwell Server
Architecture	Blackwell
Bus	PCIe 5
GPU	GB202
CUDA Cores	24,064
Tensor Cores	752 (5th gen)
RT Cores	188 (4th gen)
Memory	96GB GDDR7
ECC Memory	✓
Memory Controller	512-bit
MIG Instances	4
Confidential Computing	Supported
TDP	600W
Thermal	Passive

Relative Performance & Capability

A 3XS RTX PRO server configured with the maximum eight RTX PRO 6000 Blackwell Server Edition GPUs has an FP4 performance of 32 PFLOPS (quadrillion floating-point operations per second), which is adequate for many SLMs and LLMs with a specific focus.

It’s also worth noting that unlike most other AI hardware, RTX PRO Servers are also an ideal platform for demanding visualisation workloads such as rendering and digital twins. This is because they include dedicated RT cores that accelerate raytracing. In addition, they support both Linux and Windows Server operating systems, the latter being a crucial requirement for many visualisation applications.

The dual nature of RTX PRO Servers is a big benefit for businesses, because it means that rather than investing in dedicated AI and visualisation servers you can switch between the two workloads at will.

System	3XS RTX PRO Server with RTX PRO 6000 Blackwell Server	3XS EGX Server with L40S / RTX 6000 Ada	3XS MGX Server with RTX PRO 6000 Blackwell Server	3XS MGX Server with H200 NVL	3XS HGX Server with H200	3XS HGX Server with B200	3XS HGX Server with B300	NVIDIA DGX H200	NVIDIA DGX B200	NVIDIA DGX B300	NVIDIA DGX GB200 NVL72 cluster	NVIDIA DGX GB300 NVL73 cluster
AI performance per GPU (FP4)	4 PFLOPS	1.4 PFLOPS*	4 PFLOPS	3.3 PFLOPS	3.9 PFLOPS*	18 PFLOPS	18 PFLOPS	3.9 PFLOPS*	18 PFLOPS	18 PFLOPS	20 PFLOPS	20 PFLOPS
Memory per GPU	96GB	48GB	96GB	141GB	141GB	180GB	288GB	141GB	180GB	288GB	372GB	583GB
GPU(s)	Up to 8	Up to 8	Up to 8	Up to 8	4 or 8	4 or 8	4 or 8	8	8	8	72	72
Max AI performance (FP4)	Up to 32 PFLOPS	Up to 11.2 PLOPS*	Up to 32 PFLOPS	Up to 26.4 PFLOPS	Up to 31.2 PFLOPS*	144 PFLOPS	144 PFLOPS	Up to 31.2 PFLOPS*	144 PFLOPS	144 PFLOPS	1,400 PFLOPS	1,400 PFLOPS
Max GPU Memory	768GB	384GB	768GB	1.1TB	1.1TB	1.44TB	2.3TB	1.1TB	1.44TB	2.3TB	13.4TB	21TB
Maximum AI model size (FP4)**	1.2 trillion	0.6 trillion	1.2 trillion	1.7 trillion	1.7 trillion	2.2 trillion	3.6 trillion	1.7 trillion	2.4 trillion	3.6 trillion	48 trillion	65 trillion
Cost	££	£	££	£££	££££	£££££	££££££	££££	£££££	££££££	£££££££££	££££££££££

*Performance is FP8 as Ada Lovelace / Hopper GPUs do not support FP4 **Not necessarily representing a single model - may be multiple large models at cluster scale

Scaling Up

When it comes to scaling 3XS RTX PRO Servers, multiple servers can be connected into a POD architecture configured with shared AI-optimised PEAK:AIO software-defined storage and NVIDIA networking - either Ethernet or InfiniBand.

To aid management of the GPUs, NVIDIA Run:ai software enables intelligent resource management and consumption so that users can easily access GPU fractions, multiple GPUs or clusters of servers for workloads of every size and stage of the AI lifecycle. This ensures that all available compute can be utilised and GPUs never have to sit idle. Run:ai's scheduler is a simple plug-in for Kubernetes clusters and adds high-performance orchestration to your containerised AI workloads.

As an NVIDIA Elite Partner, our expert AI team can help you design and deploy 3XS RTX PRO Servers at scale either on-premise or with our range of datacentre hosting partners.

Conclusion

The 3XS RTX PRO Servers are the most cost-effective platform to train AI models providing 768GB of VRAM in a single chassis, with MIG support too if it is intended to be a shared resource. RTX PRO Servers are also ideal for running visualisation workloads too. The scaling options make this a cost-effective platform to scale too, although software like NVIDIA Run:ai would be recommended to get the most from the combined GPUs. For more demanding projects you should consider an or systems thanks to their superior performance and scalability.

Ready to buy?

Click the link below to view the range of AI training solutions. If you still have questions on how to select the perfect system, don't hesitate to contact one of our friendly advisors on 01204 474210 or at [email protected].

CONFIGURE 3XS RTX PRO SERVERS

3XS EGX & MGX Servers

3XS EGX and MGX servers are based on NVIDIA-certified designs. Powered by NVIDIA professional PCIe GPUs, they provide data scientists and researchers with a flexible, cost-effective platform for training AI models. Designed for datacentre environments, 3XS EGX and MGX servers are fine-tuned by our hardware engineers and workload specialists, and supplied with a custom Linux Ubuntu-based software stack for maximum performance and reliability.