Scan AI

Scan AI

Run:AI GPU Virtualisation

Harness the full potential of your GPU infrastructure for AI workloads

Why virtualise your AI infrastructure?

State-of-the-art AI systems feature multiple GPU accelerators and deliver enormous performance and workload throughput capabilities, but usually only for a single user per physical system. Virtualisation allows you to pool these resources in order to gain greater control and visibility. At each stage of the deep learning process, data scientists have specific needs for their compute resources.


Development stages require CPU or GPU resources in interactive sessions, whereas training stages are highly compute intensive, and require only GPU compute power, but lots of it. Performance and speed are critical to training AI models, but demand is not necessarily consistent – sometimes concurrent workloads are running and at other times no workloads are running as data scientists optimise their models. The Inference stage typically requires low GPU utilisation, therefore the ability to have dynamic resource allocation, that takes into account the various stages of deep learning, are critical for gaining maximum value from your AI infrastructure.

The Run:AI software platform decouples data science workloads from the underlying hardware - regardless of what hardware you have. By pooling resources and applying an advanced scheduling mechanism to data science workflows, Run:AI greatly increases the ability to fully utilise all available resources, essentially creating unlimited compute. Data scientists can increase the number of experiments they run, speed time to results, and ultimately meet the business goals of their AI initiatives.

Pool GPU Compute

Pool GPU Compute

Pool GPU compute resources to ensure visibility and control over prioritisation and allocation of resources

Guaranteed Quotas

Guaranteed Quotas

Automatic and dynamic provisioning of GPUs to break the limitations of static allocations



Dynamically change the number of resources allocated to a job to accelerate data science delivery and increase GPU utilisation

How Run:AI Works

Run:AI pools heterogeneous resources so they can be used within two logical environments, Build for development; and Train for training, to natively support data scientists different compute characteristics and increase GPU utilisation. The GPU virtualisation pool exists within a Kubernetes cluster. The two logical environments interact with the Run:AI scheduler for Build and Training workloads.

Build environment

Dedicated for building models interactively, typically using Jupyter notebooks or Pycharm, or simply by SSH-ing into a container. Performance in Build environments is typically less critical so build workloads can be run on workstations or low-end servers. 

Train environment

Dedicated for long training workloads. As performance is important in training, these workloads should run on high-end GPU servers. Containers for training can be supplemented with a checkpointing mechanism that allows automatic preemption and resume without losing the state of the training. Also users can actually go over their guaranteed quota and use more GPUs than they are assigned.

By pooling the resources and managing them using the Run:AI scheduler, administrators gain control. They can easily onboard new users, maintain and add new hardware to the pool, and gain visibility, including a holistic view of GPU usage and utilisation. In addition, data scientists can automatically provision resources without depending on IT admins.

Simple Workload Scheduling

The Run:AI Scheduler is a Kubernetes-based software solution for high-performance orchestration of containerised AI workloads. Bridging the efficiency of High-Performance Computing and the simplicity of Kubernetes – the scheduler allows users to easily make use of fractional GPUs, integer GPUs, and multiple-nodes of GPUs, for distributed training. In this way, AI workloads run based on needs, not capacity. Run:AI requires no advanced setup, and can work with any number of Kubernetes orchestration versions including Vanilla, RedHat OpenShift and HPE Container Platform.

Batch Scheduling

This refers to the grouping or ‘batching’ together of many processing jobs that can run to completion in parallel without user intervention. This way programs run to completion and then free up resources upon completion, making the system much more efficient. Training models can be queued and then launched when resources become available. Workloads can also be stopped and restarted later if resources need to be reclaimed and allocated to more urgent jobs or to under-deserved users.

Gang Scheduling

Often when using distributed training to run compute intensive jobs on multiple GPU machines, all of the GPUs need to be synchronised to communicate and share information. Gang scheduling is used when containers need to be launched together, start together, recover from failures together, and end together. Networking and communication can be automated between machines by the cluster orchestrator.

Topology Awareness

This concept describes how a researcher can run a container once and get excellent performance and then the next time get poor performance on the same server. The problem comes from the topology of GPU, CPU, and the links between them. The same problem can occur for distributed workloads due to the topology of NICs and the links between GPU servers. The Run:AI Scheduler ensures that the physical properties of AI infrastructure are taken into account when running AI workloads, for ideal and consistent performance.

King’s College London

The London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare based at King’s College London (KCL), is using the latest virtualisation software from Run:AI to speed research projects. The software optimises and enhances existing computing resources - NVIDIA DGX-1 and DGX-2 supercomputers plus their associated infrastructure installed and configured by the Scan AI team.

These NVIDIA DGX platforms are used to train algorithms to create AI powered tools for faster diagnosis, personalised therapies and effective screening, using an enormous trove of de-identified public health data from the NHS. The training requires vast amounts of GPU-accelerated compute power, which the DGX appliances provide, but to improve resource allocation and scheduling Run:AI software was added to the KCL platform - this has since doubled its GPU utilisation to support more than 300 experiments within a 40 day period.

Read case study

Guided Proof of Concepts with Scan AI and Run:AI

The Scan AI team is unique in its ability to offer a Proof of Concept (PoC) trial of the Run:AI software platform running on multiple NVIDIA GPUs. This allows you to understand how the scheduling and pooling software will improve your GPU utilisation and there are two options as to how the POC can be conducted:

POC in a customer's on premises environment

In this scenario, a prospect's data scientists can run an evaluation of Run:AI in their own environment, using their own workflows. They can choose one, multiple, or all of their servers to ruin their POC. A trial done this way allows researchers to continue to run experiments with Run:AI in their production environment without having to transfer experiments to a test environment. The on-premises POC allows them to compare Run:AI to their existing tools, easily enabling users to see benchmarks and measure efficiency of the Run:AI system. Currently, prospective customers do not need to purchase a POC license, but this may change in 2021.

POC in a Scan Lab Environment

In the second scenario, a prospective customer can avoid setting up a dedicated production cluster for a POC and instead use a pre-prepared Scan environment to evaluate Run:AI. In this scenario, prospects can try out all of the features of Run:AI, even though they would not be able to see their own data center - for example, pooling disparate resources and scaling distributed training across many nodes, which they can run in the Scan environment even if their own usage of DL / ML is not yet at a large scale. The Scan environment is ready-to-use and already has Kubernetes and Run:AI installed, so customers can avoid the potential inconvenience of installation.

Register for POC

Ways to Purchase

Applying Run:AI software to an existing cluster of GPUs, no matter how disparate, you will see an immediate improvement in how your newly virtualised pool of GPU resource can be scheduled and shared out.

Run:AI software is licensed per GPU you want to virtualise - regardless of the age or specification of any GPU, making for a very easy way to improve productivity and to keep increasing your virtual GPU pool as you add GPUs to your infrastructure.

When choosing new hardware for your AI projects, including the added flexibility that Run:AI software provides couldn’t be easier. For each system simply match the number of Run:AI licences to the number of GPUs in either a 1, 3 or 5 year subscription. Our 3XS build team will install the software, so you have scheduling control and the ability to maximise your GPU utilisation out of the box. Furthermore, if you select Run:AI licences to your system builds every time, your GPU pool will continue to grow seamlessly with each hardware addition - the software will simply ‘discover’ the new GPUs and add them to your resource pool.

The 3XS Systems team and Run:AI has developed a range of certified appliances - designed, tested and configured to get the most out of GPU virtualisation whilst remaining cost effective. They each include a 1-year licence for Run:AI software and cover a range of specifications - from development workstations to server platforms.

Model Development Workstation Training Server Training Server
TF32 Performance 116TF 450TF 656TF
FP64 Performance 1.9TF 7.2TF 41.6TF
Cost £18,999 ex VAT £50,499 ex VAT £62,499 ex VAT
Where to buy View model View model View model
GPUs 4x watercooled NVIDIA
GeForce RTX 3090
GPU Specifications 10,496 CUDA
cores per GPU
10,752 CUDA
cores per GPU
3,804 CUDA
cores per GPU
GPU Memory 24GB GDDR6X per GPU,
96GB total
48GB GDDR6 per GPU,
288GB total
24GB HBM per GPU,
192GB total
GPU Interconnects GPUs paired with NVLink GPUs paired with NVLink GPUs paired with NVLink
CPU AMD Threadripper PRO
3975WX, 32C/64T
2x AMD EPYC 7513,
combined 64C/128T
2x AMD EPYC 7702,
combined 128C/256T
System Memory 256GB ECC Reg DDR4 512GB ECC Reg DDR4 1,024GB ECC Reg DDR4
System Drives 1TB SSD SSD 1TB SSD 2x 1TB SSD
Storage Drives 4TB HDD 4x 3.84TB SSD 4x 3.84TB SSD
Networking 2x 10GbE 2x 200GbE/IB 2x 200GbE/IB
Operating System Ubuntu Linux Ubuntu Linux Ubuntu Linux
Run:AI License 1 Year Subscription 1 Year Subscription 1 Year Subscription
Power Requirement 2,750W 6,600W 6,600W
Dimensions 307 x 697 x 693mm Tower 4U 19in Rackmount 4U 19in Rackmount
Find out more Find out more