Scan AI, as a leading NVIDIA Elite Solution Provider, can deliver a variety of enterprise infrastructure architectures with the DGX A100 at their centre. The Scan AI ecosystem is designed to deliver maximum performance from GPU-accelerated hardware by combining industry-leading compute systems with AI optimised flash storage and low latency networking. Although there is a vast variety of ways in which these infrastructure solutions can be configured, there are numerous NVIDIA certified architectures that are tried and tested to provide the optimal performance for your deep learning and AI workloads. These are the DGX POD and DGX SuperPOD.
In combination with leading storage technology providers, Scan AI is proud to offer a portfolio of NVIDIA DGX POD reference architecture solutions that incorporate NVIDIA DGX A100, NVIDIA Mellanox networking and a certified all-flash storage platform of your choice. These are delivered as fully integrated, ready-to-deploy offerings, these solutions make your datacentre AI deployments simpler and faster.
NetApp ONTAP AI
Combining the NetApp AFF A-series storage appliances with the DGX A100.
Combining DDN A³I storage applainces with the DGX A100.
Dell-EMC PowerScale and Isilon
Combining Dell-EMC PowerScale or Isilon appliances with DGX A100.
IBM Spectrum AI
Combining IBM Elastic Storage System (ESS) appliances with the DGX A100.
The NVIDIA DGX SuperPOD is designed to tackle the most important challenges of AI at scale, delivering unmatched levels of multi-system training. Traditional large compute clusters are constrained by the complexity of scaling inter-GPU communications as configurations become larger and computation is parallelised over more and more nodes. This results in diminishing performance returns. DGX SuperPOD solves this scaling problem by optimising every component in the system for the unique demands of multi-node AI infrastructure.
Optimised for dense compute clusters running close to operational limits, requiring advanced cooling technology.
High-bandwidth, low-latency fabrics based on NVIDIA Mellanox InfiniBand.
Support for very large datasets with millions of objects, requiring very high input/output operations per second (IOPS) to keep GPUs fed.
Assume higher watts per rack but gain much greater floating-point operations per second (FLOPS) per watt with reduced footprint.
Performance at scale requires cluster aware software and management.
NVIDIA DGX SuperPOD brings together a design-optimised combination of AI computing, network fabric, storage, and software. Its compute foundation is built on multiple NVIDIA DGX A100 units - minimum 20, maximum 140 - which provides unprecedented compute density and flexibility offering up to 700 PetaFLOPS performance. The DGX SuperPOD's high-performance network fabric includes innovative NVIDIA InfiniBand in-Network Computing technologies such as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) and congestion control. This powerful combination delivers the highest performance and scalability, with reduced operational costs and infrastructure complexity.
To enable secure multi-tenancy and isolation of users and data, DGX SuperPOD delivers cloud-native supercomputing by integrating NVIDIA BlueField data processing units (DPUs) into each DGX A100 system. DGX SuperPOD with NVIDIA BlueField DPUs gives modern enterprises a secure, multi-tenant datacentre platform on which IT can deliver deterministic, bare-metal performance without compromise for every user and workload. The Bluefield DPU not only delivers class leading security but also acts to offload software stack management overheads from the CPUs to enable increased performance.
To further streamline operations, DGX SuperPOD features NVIDIA Base Command Manager. The same software used to manage thousands of NVIDIA’s own systems, Base Command Manager is the best of breed infrastructure solution for provisioning and lifecycle management, monitoring, telemetry, logging, alerting, and scheduling.
To facilitate this hardware and software stack performance at its optimal rate, it also requires extremely high-speed storage to run at peak capacity. In a well-architected system, storage solutions need to handle a variety of data types—such as text, tabular data, audio, and video in parallel and with unwavering performance. Certified storage for NVIDIA DGX SuperPOD is carefully selected from and tested for the unique demands of AI workloads and then optimised for each environment to ensure success. Choices include solutions from NetApp, DDN, Dell and IBM and scale from 1-10 petabytes.
Secure managed hosting
Accommodating a DGX SuperPOD may not be possible on every organisations premises, so Scan AI has teamed up with a number of secure hosting partners with UK based datacentres. This means you can be safe in the knowledge that the location that houses your infrastructure is perfect to manage a SuperPOD and accelerate your AI projects. This program ensures datacentre partners are accredited and provide a broader range of services, including proof of concepts and AI-as-a-service offerings. All of Scan’s chosen datacentre partners meet this standard.