High Performance Computing for the Workgroup
Data science teams are at the leading edge of AI innovation, developing projects that can transform their organisations and our world. As such these teams need a dedicated AI platform that can plug in anywhere and is fully optimised across hardware and software to deliver groundbreaking performance for multiple, simultaneous users anywhere in the world. The DGX Station A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs. Designed for multiple, simultaneous users, DGX Station A100 leverages server-grade components in an office-friendly form factor. It's the only system with four fully interconnected and Multi-Instance GPU (MIG)-capable NVIDIA A100 Tensor Core GPUs with up to 320GB of total GPU memory that can plug into a standard power outlet, resulting in a powerful AI appliance that you can place anywhere.
AI Ideation Workshops with Scan & NVIDIA
Join us for a day long workshop to evaluate your current AI strategy, goals and needsFind out more
The worlds first free-standing AI system built on NVIDIA A100
The NVIDIA DGX Station A100 provides datacentre-class AI server capabilities in a workstation form factor, suitable for use in a standard office environment without specialised power and cooling. Its design includes four ultra-powerful NVIDIA A100 Tensor Core GPUs - with either 40 or 80GB of GPU memory; a 64-core server-grade CPU, NVMe storage, and PCIe Gen4 buses. The DGX Station A100 also includes a Baseboard Management Controller (BMC) allowing system administrators to perform any required tasks over a remote connection. The four NVLink interconnected GPUs, deliver 2.5 petaFLOPS of performance and support multi instance GPU (MIG), offering 28 separate GPU devices for parallel jobs and multiple users without impacting system performance.
4x A100 80GB (320GB total)
OS: 1x 1.92TB NVMe drive
Internal storage: 7.68TB U.2 NVMe drive
Single AMD EPYC 7742, 64-cores, 2.25GHz – 3.4GHz
Dual-port 10GbE LAN, Single-port 1GbE BMC management port
4GB GPU memory, 4x Mini DisplayPort
The NVIDIA Ampere architecture, designed for the age of elastic computing, delivers the next giant leap by providing unmatched acceleration at every scale The A100 GPU brings massive amounts of compute to datacentres. To keep those compute engines fully utilised, it has a leading class 1.6TB/sec of memory bandwidth, a 67 per cent increase over the previous generation DGX. In addition, the DGX A100 has significantly more on-chip memory, including a 40MB Level 2 cache—7x larger than the previous generation—to maximise compute performance.
TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. Combining TF32 with structured sparsity on the A100 enables performance gains over Volta of up to 20x. Applications using NVIDIA libraries enable users to harness the benefits of TF32 with no code change required. TF32 Tensor Cores operate on FP32 inputs and produce results in FP32. Non-matrix operations continue to use FP32.
Modern AI networks are big and getting bigger, with millions and in some cases billions of parameters. Not all of these parameters are needed for accurate predictions and inference, and some can be converted to zeros to make the models 'sparse’ without compromising accuracy. Tensor Cores in A100 can provide up to 2x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also be used to improve the performance of model training.
Multi-Instance GPU (MIG) expands the performance and value of each NVIDIA A100 GPU. MIG can partition the A100 GPU into as many as seven instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores. Now administrators can support every workload, from the smallest to the largest, offering a right-sized GPU with guaranteed quality of service (QoS) for every job, optimising utilisation and extending the reach of accelerated computing resources to every user.
Expand GPU access to more users
With MIG, you can achieve up to 7X more GPU resources on a single A100 GPU. MIG gives researchers and developers more resources and flexibility than ever before.
Optimise GPU utilisation
MIG provides the flexibility to choose many different instance sizes, which allows provisioning of right-sized GPU instance for each workload, ultimately delivering optimal utilization and maximizing data center investment.
Run simultaneous mixed workloads
MIG enables inference, training, and high-performance computing (HPC) workloads to run at the same time on a single GPU with deterministic latency and throughput.
Upto 7 GPU instances in a single A100
Dedicated SM, Memory, L2 Cache, Bandwidth for hardware QoS & isolation
Simultaneous workload execution with guaranteed quality of service
All MIG instances run in parallel with predicatable throughput & latency
Right-sized GPU allocation
Different sized MIG instances based on target workloads
To run any type of workload on a MIG instance
Diverse deployment environment
Supported with Bare metal, Docker, Kubernetes, Virtualised env.
Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVIDIA NVLink in A100 doubles the GPU-to-GPU direct bandwidth to 600GB/s, almost 20x more than PCIe 4.0. NVLink is an essential building block of the complete NVIDIA datacentre solution that incorporates hardware, networking, software, libraries, and optimised AI models and applications from NVIDIA GPU Cloud (NGC).
The NVIDIA GPU Cloud
The NGC provides researchers and data scientists with simple access to a comprehensive catalogue of GPU-optimised software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA A100 tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualisation containers, and partner applications.
Proof of Concept
Sign up to try one of the AI & Deep Learning solutions available from Scan ComputersRegister for PoC >
|GPUs||4x NVIDIA A100 Tensor Core GPUs|
|GPU Specifications||6912 CUDA cores / 432 TF32 Tensor Cores per GPU|
|GPU Memory||80GB per GPU - 320GB total|
|CPU||AMD EPYC 7742P - 64 cores / 128 threads|
|System Memory||512GB ECC Reg DDR4|
|System Drives||1x 1.92TB NVMe SSD|
|Storage Drives||7.68TB u.2 NVMe SSD|
|Networking||2x 10GBE LAN ports / 1x 1GbE BMC management port|
|Operating System||Ubuntu Linux|
|Dimensions||256 x 639 x 518mm (W x H x D)|
|Operating Temperature||5°C - 35°C (41°F - 95°F)|