High-Performance Computing

A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This enables researchers to reduce a 10-hour, double-precision simulation running on NVIDIA V100 Tensor Core GPUs to just four hours on A100. HPC applications can also leverage TF32 precision in A100’s Tensor Cores to achieve up to 10X higher throughput for single-precision dense matrix multiply operations.

AI Ideation Workshops with Scan & NVIDIA

Join us for a day long workshop to evaluate your current AI strategy, goals and needs

Find out more

The worlds first AI System built on NVIDIA A100

NVIDIA DGX™ A100 is the universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. NVIDIA DGX A100 features the world’s most advanced accelerator, the NVIDIA A100 Tensor Core GPU, enabling enterprises to consolidate training, inference, and analytics into a unified, easy-to-deploy AI infrastructure that includes direct access to NVIDIA AI experts.

Video
  • 8X NVIDIA A100 GPUS WITH 320 GB TOTAL GPU MEMORY
    12 NVLinks/GPU, 600 GB/s GPU-to-GPU Bi-directonal Bandwidth

  • 6X NVIDIA NVSWITCHES
    4.8 TB/s Bi-directional Bandwidth, 2X More than Previous Generation NVSwitch

  • 9x MELLANOX CONNECTX-6 200Gb/S NETWORK INTERFACE
    450 GB/s Peak Bi-directional Bandwidth

  • DUAL 64-CORE AMD CPUs AND 1 TB SYSTEM MEMORY
    3.2X More Cores to Power the Most Intensive AI Jobs

  • 15 TB GEN4 NVME SSD
    25GB/s Peak Bandwidth, 2X Faster than Gen3 NVME SSDs

Chart

The NVIDIA Ampere architecture, designed for the age of elastic computing, delivers the next giant leap by providing unmatched acceleration at every scale The A100 GPU brings massive amounts of compute to datacentres. To keep those compute engines fully utilised, it has a leading class 1.6TB/sec of memory bandwidth, a 67 per cent increase over the previous generation DGX. In addition, the DGX A100 has significantly more on-chip memory, including a 40MB Level 2 cache—7x larger than the previous generation—to maximise compute performance.

Tensor Cores

TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. Combining TF32 with structured sparsity on the A100 enables performance gains over Volta of up to 20x. Applications using NVIDIA libraries enable users to harness the benefits of TF32 with no code change required. TF32 Tensor Cores operate on FP32 inputs and produce results in FP32. Non-matrix operations continue to use FP32.

Chart

Modern AI networks are big and getting bigger, with millions and in some cases billions of parameters. Not all of these parameters are needed for accurate predictions and inference, and some can be converted to zeros to make the models 'sparse’ without compromising accuracy. Tensor Cores in A100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also be used to improve the performance of model training.

Acceleration

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 GPU. MIG can partition the A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimising utilisation and extending the reach of accelerated computing resources to every user.

Expand GPU access to more users

With MIG, you can achieve up to 7X more GPU resources on a single A100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimise GPU utilisation

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.

Run simultaneous mixed workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.

Upto 7 GPU instances in a single A100

Dedicated SM, Memory, L2 Cache, Bandwidth for hardware QoS & isolation

Simultaneous workload execution with guaranteed quality of service

All MIG instances run in parallel with predicatable throughput & latency

Right-sized GPU allocation

Different sized MIG instances based on target workloads

Flexibility

To run any type of workload on a MIG instance

Diverse deployment environment

Supported with Bare metal, Docker, Kubernetes, Virtualised env.

NVLINK

Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in A100 doubles the GPU-to-GPU direct bandwidth to 600GB/s, almost 20x more than PCI-E 4.0. When paired with the latest generation of NVIDIA NVSwitch, all GPUs in the server can communicate with each other at full NVLink speed for incredibly fast training.

NVLink and NVSwitch are essential building blocks of the complete NVIDIA datacentre solution that incorporates hardware, networking, software, libraries, and optimised AI models and applications from NVIDIA GPU Cloud (NGC).

GPU Cloud

The NVIDIA GPU Cloud

The NGC provides researchers and data scientists with simple access to a comprehensive catalogue of GPU-optimised software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA A100 tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualisation containers, and partner applications.

DGX A100 Pod

The DGX A100 Pod

As an end-to-end AI solution provider, Scan can provide complete DGX A100 architectures paired with Mellanox InfiniBand low-latency high-throughput network fabric and a choice of all-flash storage solutions. These reference architectures ensure maximum GPU performance and utilisation and are supported by complete software and services packages.

• DGX POD more attainable than ever with DGX A100
• Experience a faster start with building flexible AI infrastructure
• Proven architectures, with leading storage partners
• Up to 40 PFLOPS computing power in just 2 racks
• 700 PFLOPS of power to train the previously impossible

Ai Storage

AI Optimised Storage

Deep learning appliances such as the DGX A100 only works as intended if the GPU accelerators are fed data consistently and rapidly enough that the maximum utilisation is delivered. Scan offers a wide range of AI-optimised storage appliances suitable for deployment with the DGX A100.

Proof of Concept

Sign up to try one of the AI & Deep Learning solutions available from Scan Computers

Register for PoC >
DGX with NVIDIA A100
GPUs 8x NVIDIA A100 Tensor Core GPUs
GPU Specifications 6912 CUDA cores & 432 TF32 Tensor Cores per GPU
GPU Memory 40GB per GPU, 320GB total
GPU Interconnect 6x NVSwitch
Host CPUs 2x AMD EPYC 7742, total 128 cores / 256 threads
System Memory 1TB ECC Reg DDR4
System Drives 2x 1.92TB NVMe SSDs
Storage Drives 4x 3.84TB NVMe SSDs
Networking 8x single-port Mellanox ConnectX-6 VPI 200Gb/s HDR Infiniband. 1x dual-port Mellanox ConnectX-6 VPI 200Gb/s Ethernet.
Operating System Ubuntu Linux
Power Requirement 6.5kW
Size 6U
Weight 123kg
Operating Temperature Range 5ºC to 30ºC (41ºF to 86ºF)
Find out more Find out more