Scan AI

Scan AI

NVIDIA DGX POD & SuperPOD

The industry standard for AI at scale

Scan AI, as a leading NVIDIA Elite Solution Provider, can deliver a variety of enterprise infrastructure architectures with the DGX A100 at their centre. The Scan AI ecosystem is designed to deliver maximum performance from GPU-accelerated hardware by combining industry-leading compute systems with AI optimised flash storage and low latency networking. Although there is a vast variety of ways in which these infrastructure solutions can be configured, there are numerous NVIDIA certified architectures that are tried and tested to provide the optimal performance for your deep learning and AI workloads. These are the DGX POD and DGX SuperPOD.

DGX POD

In combination with leading storage technology providers, Scan AI is proud to offer a portfolio of NVIDIA DGX POD reference architecture solutions that incorporate NVIDIA DGX A100, NVIDIA Mellanox networking and a certified all-flash storage platform of your choice. These are delivered as fully integrated, ready-to-deploy offerings, these solutions make your datacentre AI deployments simpler and faster.

 

NetApp ONTAP AI

Combining the NetApp AFF A-series storage appliances with the DGX A100.

 

DDN A³I

Combining DDN A³I storage applainces with the DGX A100.

 

Dell-EMC PowerScale and Isilon

Combining Dell-EMC PowerScale or Isilon appliances with DGX A100.

 

IBM Spectrum AI

Combining IBM Elastic Storage System (ESS) appliances with the DGX A100.

DGX SuperPOD

The NVIDIA DGX SuperPOD is designed to tackle the most important challenges of AI at scale, delivering unmatched levels of multi-system training. Traditional large compute clusters are constrained by the complexity of scaling inter-GPU communications as configurations become larger and computation is parallelised over more and more nodes. This results in diminishing performance returns. DGX SuperPOD solves this scaling problem by optimising every component in the system for the unique demands of multi-node AI infrastructure.

Rack Design

Optimised for dense compute clusters running close to operational limits, requiring advanced cooling technology.

Networking

High-bandwidth, low-latency fabrics based on NVIDIA Mellanox InfiniBand.

Storage

Support for very large datasets with millions of objects, requiring very high input/output operations per second (IOPS) to keep GPUs fed.

Facilities

Assume higher watts per rack but gain much greater floating-point operations per second (FLOPS) per watt with reduced footprint.

Software

Performance at scale requires cluster aware software and management.

NVIDIA DGX SuperPOD brings together a design-optimised combination of AI computing, network fabric, storage, and software. Its compute foundation is built on multiple NVIDIA DGX A100 units - minimum 20, maximum 140 - which provides unprecedented compute density and flexibility offering up to 700 PetaFLOPS performance. The DGX SuperPOD's high-performance network fabric includes innovative NVIDIA InfiniBand in-Network Computing technologies such as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) and congestion control. This powerful combination delivers the highest performance and scalability, with reduced operational costs and infrastructure complexity.

To enable secure multi-tenancy and isolation of users and data, DGX SuperPOD delivers cloud-native supercomputing by integrating NVIDIA BlueField data processing units (DPUs) into each DGX A100 system. DGX SuperPOD with NVIDIA BlueField DPUs gives modern enterprises a secure, multi-tenant datacentre platform on which IT can deliver deterministic, bare-metal performance without compromise for every user and workload. The Bluefield DPU not only delivers class leading security but also acts to offload software stack management overheads from the CPUs to enable increased performance.

To further streamline operations, DGX SuperPOD features NVIDIA Base Command Manager. The same software used to manage thousands of NVIDIA’s own systems, Base Command Manager is the best of breed infrastructure solution for provisioning and lifecycle management, monitoring, telemetry, logging, alerting, and scheduling.

To facilitate this hardware and software stack performance at its optimal rate, it also requires extremely high-speed storage to run at peak capacity. In a well-architected system, storage solutions need to handle a variety of data types—such as text, tabular data, audio, and video in parallel and with unwavering performance. Certified storage for NVIDIA DGX SuperPOD is carefully selected from and tested for the unique demands of AI workloads and then optimised for each environment to ensure success. Choices include solutions from NetApp, DDN, Dell and IBM and scale from 1-10 petabytes.

Secure managed hosting

Accommodating a DGX SuperPOD may not be possible on every organisations premises, so Scan AI has teamed up with a number of secure hosting partners with UK based datacentres. This means you can be safe in the knowledge that the location that houses your infrastructure is perfect to manage a SuperPOD and accelerate your AI projects. This program ensures datacentre partners are accredited and provide a broader range of services, including proof of concepts and AI-as-a-service offerings. All of Scan’s chosen datacentre partners meet this standard.

Find out more
Find out more Find out more