Understand Inference Performance

With inference, speed is just the beginning of performance. To get a complete picture about inference performance, there are seven factors to consider, ranging from programmability to rate of learning. The NVIDIA TensorRT Hyperscale Inference Platform delivers on all fronts. It delivers the best inference performance at scale with the versatility to handle the growing diversity of today's networks.

Programmability

Programmability

Low Latency

Low Latency

Accuracy

Accuracy

Size of Network

Size of Network

Throughput

Throughput

Efficiency

Efficiency

Rate of Learning

Rate of Learning

NVIDIA T4 – Powered by Turing Tensor Cores

The NVIDIA Tesla T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 70-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

NVIDIA Tensor Cores

Tensor Cores

The Power of NVIDIA TensorRT

NVIDIA TensorRT is a high-performance inference platform that includes an optimiser, runtime engines, and inference server to deploy applications in production. TensorRT speeds apps up to 40X over CPU-only systems for video streaming, recommendation, and natural language processing.

Production Ready Datacentre Inference

The NVIDIA TensorRT inference server is a containerised micro-service that enables applications to use AI models in datacentre production. It maximizes GPU utilization, supports all popular AI frameworks, and integrates with Kubernetes and Docker.

Tesla Diagram

NVIDIA Tesla T4 Specifications

Performance
Turing Tensor Cores 320
NVIDIA CUDA Cores 2,560
Single Precision Performance (FP32) 8.1 TFLOPS
Mixed Precision (FP16/FP32) 65 FP16 TFLOPS
INT8 Precision 130 INT8 TOPS
INT4 Precision 260 INT4 TOPS
Interconnect
GEN3 x16 PCIe
Memory
Capacity 16GB GDDR6
Bandwidth 320+ GB/s
Power
Usage 70 Watts