High Performance Computing for the Workgroup

Data science teams are at the leading edge of AI innovation, developing projects that can transform their organisations and our world. As such these teams need a dedicated AI platform that can plug in anywhere and is fully optimised across hardware and software to deliver groundbreaking performance for multiple, simultaneous users anywhere in the world. The DGX Station A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs. Designed for multiple, simultaneous users, DGX Station A100 leverages server-grade components in an office-friendly form factor. It's the only system with four fully interconnected and Multi-Instance GPU (MIG)-capable NVIDIA A100 Tensor Core GPUs with up to 320GB of total GPU memory that can plug into a standard power outlet, resulting in a powerful AI appliance that you can place anywhere.

AI Ideation Workshops with Scan & NVIDIA

Join us for a day long workshop to evaluate your current AI strategy, goals and needs

Find out more

The worlds first free-standing AI system built on NVIDIA A100

The NVIDIA DGX Station A100 provides datacentre-class AI server capabilities in a workstation form factor, suitable for use in a standard office environment without specialised power and cooling. Its design includes four ultra-powerful NVIDIA A100 Tensor Core GPUs - with either 40 or 80GB of GPU memory; a 64-core server-grade CPU, NVMe storage, and PCIe Gen4 buses. The DGX Station A100 also includes a Baseboard Management Controller (BMC) allowing system administrators to perform any required tasks over a remote connection. The four NVLink interconnected GPUs, deliver 2.5 petaFLOPS of performance and support multi instance GPU (MIG), offering 28 separate GPU devices for parallel jobs and multiple users without impacting system performance.

Chart
Chart

Fundamentals of Deep Learning for Multi-GPUs

Join Scan on the 25th March 2021 to hear a presentation from Run:AI explaining theirvirtualisation software for AI infrastructure

Find Out More
One GPUs
4x A100 40GB (160GB total) or 4x A100 80GB (320GB total)
Two Memory
512GB DDR4
Three GPU Interconnects
NVLink
Four Storage
OS: 1x 1.92TB NVMe drive
Internal storage: 7.68TB U.2 NVMe drive
Five CPU
Single AMD EPYC 7742, 64-cores, 2.25GHz – 3.4GHz
Six Networking
Dual-port 10GbE LAN, Single-port 1GbE BMC management port
Seven Displays
4GB GPU memory, 4x Mini DisplayPort
Eight Cooling
Water cooled
Nine Power
1.5kW
Chart

The NVIDIA Ampere architecture, designed for the age of elastic computing, delivers the next giant leap by providing unmatched acceleration at every scale The A100 GPU brings massive amounts of compute to datacentres. To keep those compute engines fully utilised, it has a leading class 1.6TB/sec of memory bandwidth, a 67 per cent increase over the previous generation DGX. In addition, the DGX A100 has significantly more on-chip memory, including a 40MB Level 2 cache—7x larger than the previous generation—to maximise compute performance.

Tensor Cores

TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. Combining TF32 with structured sparsity on the A100 enables performance gains over Volta of up to 20x. Applications using NVIDIA libraries enable users to harness the benefits of TF32 with no code change required. TF32 Tensor Cores operate on FP32 inputs and produce results in FP32. Non-matrix operations continue to use FP32.

Chart

Modern AI networks are big and getting bigger, with millions and in some cases billions of parameters. Not all of these parameters are needed for accurate predictions and inference, and some can be converted to zeros to make the models 'sparse’ without compromising accuracy. Tensor Cores in A100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also be used to improve the performance of model training.

Acceleration

Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 GPU. MIG can partition the A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimising utilisation and extending the reach of accelerated computing resources to every user.

Expand GPU access to more users

With MIG, you can achieve up to 7X more GPU resources on a single A100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.

Optimise GPU utilisation

MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.

Run simultaneous mixed workloads

MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.

Upto 7 GPU instances in a single A100

Dedicated SM, Memory, L2 Cache, Bandwidth for hardware QoS & isolation

Simultaneous workload execution with guaranteed quality of service

All MIG instances run in parallel with predicatable throughput & latency

Right-sized GPU allocation

Different sized MIG instances based on target workloads

Flexibility

To run any type of workload on a MIG instance

Diverse deployment environment

Supported with Bare metal, Docker, Kubernetes, Virtualised env.

NVLINK

Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in A100 doubles the GPU-to-GPU direct bandwidth to 600GB/s, almost 20x more than PCIe 4.0. NVLink is an essential building block of the complete NVIDIA datacentre solution that incorporates hardware, networking, software, libraries, and optimised AI models and applications from NVIDIA GPU Cloud (NGC).

The NVIDIA GPU Cloud

The NGC provides researchers and data scientists with simple access to a comprehensive catalogue of GPU-optimised software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA A100 tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualisation containers, and partner applications.

Proof of Concept

Sign up to try one of the AI & Deep Learning solutions available from Scan Computers

Register for PoC >
DGX Station
GPUs 4x NVIDIA A100 Tensor Core GPUs
GPU Specifications 6912 CUDA cores / 432 TF32 Tensor Cores per GPU
GPU Memory 40GB per GPU - 160GB total / 80GB per GPU total
GPU Interconnects NVLink
CPU AMD EPYC 7742P - 64 cores / 128 threads
System Memory 512GB ECC Reg DDR4
System Drives 1x 1.92TB NVMe SSD
Storage Drives 7.68TB u.2 NVMe SSD
Networking 2x 10GBE LAN ports / 1x 1GbE BMC management port
Operating System Ubuntu Linux
Power Requirement 1.5kW
Dimensions 256 x 639 x 518mm (W x H x D)
Weight 43.1Kg
Operating Temperature 5°C - 35°C (41°F - 95°F)
Find out more Find out more