Scan's TekSpek

Our Aim
To provide you with an overview on New And existing technologies, hopefully helping you understand the changes in the technology. Together with the overviews we hope to bring topical issues to light from a series of independent reviewers saving you the time And hassle of fact finding over the web.

We will over time provide you with quality content which you can browse and subscribe to at your leisure.

TekSpek GPU - Graphics

NVIDIA GeForce RTX 3080

Date issued: 17/09/2020

NVIDIA Ampere 2nd Gen RTX 3080

NVIDIA introduced the GeForce RTX 3070, RTX 3080 and RTX 3090 graphics cards this month. The high-performance trio is based on the new Ampere architecture which is a successor to Turing powering the GeForce RTX 20-series available today. This TekSpek explains what Ampere is, how it works, and goes into further detail on the first available card, the GeForce RTX 3080.

NVIDIA typically introduces a new graphics architecture every two years. 2016's Pascal powered the GeForce GTX 10-series, 2018 debuted Turing, and 2020 brings Ampere into the fold. Going by timelines, NVIDIA's cards for the next two years will be based on Ampere.

Ampere Examined

This new architecture is an evolution of Turing rather than a grounds-up design. It adds more horsepower by a number of methods made possible by switching to a smaller, more efficient manufacturing process from Samsung. Built on 8nm, Ampere-based cards can fit in roughly 60 percent more transistors - the building blocks of performance - for the same space as Turing. It is the job of NVIDIA's hardware architects to determine how best to spend this extra performance budget.

The streaming multiprocessor (SM) unit is the beating heart of NVIDIA's graphics architecture. Ampere takes due note of its increased transistor allowance and improves performance by adding more cores that can do general-purpose Cuda processing. Whereas the previous-generation Turing's SM has 64 FP32 cores, Ampere can run up to 128 FP32 - a literal doubling of peak performance. That is not to say each SM will be twice as fast this time around; the vagaries of GPU design mean that, in games at least, Ampere will never achieve peak FP32 utilisation because the SM unit needs to process what is known as Integer workloads. The datapath on the left of the picture shows that it can run FP32 or INT32, though not at the same time. Even so, having more Cuda power is a good thing. NVIDIA beefs up the associated caches to keep the units full and ready.

Turing introduced specific hardware for raytracing (RT cores) and running deep learning (Tensor cores). Ampere improves the performance of the former by removing processing bottlenecks in the first-gen design and adds extra hardware to handle raytraced motion blur. On the Tensor core front, which runs the DLSS technology integral to GeForce RTX cards, Ampere actually reduces the number of per-SM units by half, though each core runs at twice the speed for what are known as dense matrix and four times the speed for sparse matrix. In other words, per-SM Tensor performance ought to at least as good as Turing in every situation, and may be twice as fast.

Another key improvement rests with how Ampere is able to leverage its architecture toolset in concurrent fashion. The last-gen Turing can run its general rasterisation cores alongside raytracing, which helps reduce the time it takes to complete every frame. Ampere, however, is not only faster at doing exactly that - 13ms vs. 7.5ms - on equivalent cards, it can further reduce frame-completion time by also running the Tensor cores at the same time. This is how NVIDIA reckons the GeForce RTX 3080 is up to twice as fast as the GeForce RTX 2080 Super it effectively replaces.

Ampere, therefore, has more powerful SMs to deal with games rendering. It also has more of them, and we'll get to that momentarily. Yet a modern graphics card needs to be balanced. Nvidia takes this on board by increasing the capability of what is known as the backend of the card: the memory subsystem. Ampere GPUs incorporate brand-new GDDR6X technology co-developed with industry expert Micron. It works by having double the voltage levels as GDDR6, meaning it can transfer twice the amount of data per clock cycle. Coding ensures that such large-scale data transfers aren't compromised by large voltage changes introduced by this memory.

NVIDIA takes the opportunity of eking out as much performance as possible from Ampere by taking advantage of how the 8nm process scales with respect to frequency and voltage. Unlike previous generations, Ampere cards' power budgets are significantly higher - up to 350W compared to 250W - and cooling solutions need to be more robust to keep noise and temperature in check.

Keeping up with the times, Ampere upgrades the connection to the motherboard by using PCIe 4.0 for double the transfer speeds. It also adds in support for HDMI 2.1 for 8K60 or 4K120 via a single cable. Last but not least, Ampere silicon supports hardware decoding of the new AV1 codec.

GeForce RTX 3080

Turing to Ampere
	RTX 3090	RTX 3080	RTX 3070	RTX 2080 Ti	RTX 2080 Super	RTX 2080	RTX 2070 Super
Launch date	Sep 2020	Sep 2020	Oct 2020	Sep 2018	July 2019	Sep 2018	July 2019
Codename	GA102	GA102	GA104	TU102	TU104	TU104	TU104
Architecture	Ampere	Ampere	Ampere	Turing	Turing	Turing	Turing
Process (nm)	8	8	8	12	12	12	12
Transistors (bn)	28.3	28.3	17.4	18.6	13.6	13.6	13.6
Die Size (mm²)	628.4	628.4	392.5	754	545	545	545
PCIe	4.0	4.0	4.0	3.0	3.0	3.0	3.0
Base Clock (MHz)	1,400	1,440	1,500	1,350	1,650	1,515	1,605
Boost Clock (MHz)	1,695	1,710	1,725	1,545	1,815	1,710	1,770
Founders Edition Clock (MHz)	1,695	1,710	1,725	1,635	1,815	1,800	1,770
Shaders	10,496	8,704	5,888	4,352	3,072	2,944	2,560
GFLOPS	35,581	29,768	20,314	13,448	11,151	10,068	9,062
Founders Edition GFLOPS	35,581	29,768	20,314	14,231	11,151	10,598	9,062
Tensor Cores	328	272	184	544	384	368	320
RT Cores	82	68	46	68	48	46	40
Memory Size	24GB	10GB	8GB	11GB	8GB	8GB	8GB
Memory Bus	384-bit	320-bit	256-bit	352-bit	256-bit	256-bit	256-bit
Memory Type	GDDR6X	GDDR6X	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6
Memory Clock	19.5Gbps	19Gbps	14Gbps	14Gbps	15.5Gbps	14Gbps	14Gbps
Memory Bandwidth	936	760	448	616	496	448	448
ROPs	112	96	64	88	64	64	64
Texture Units	328	272	184	272	192	184	160
L2 cache (KB)	5,120	5,120	4,096	5,632	4,096	4,096	4,096
SLI	Yes	No	No	Yes	Yes	Yes	Yes
Power Connector (FE)	12-pin	12-pin	8-pin	8-pin + 8-pin	8-pin + 6-pin	8-pin + 6-pin	8-pin + 6-pin
TDP (watts)	350	320	220	250	250	215	215
Founders Edition TDP (watts)	350	320	220	260	250	225	215
Suggested MSRP	$1,499	$699	$499	$999	$699	$699	$499
Founders Edition MSRP	$1.499	$699	$499	$1,199	$699	$799	$499

NVIDIA's biggest gaming Ampere die is known as GA102. It can house up to 84 SMs for a maximum 10,752 cores. RTX 3080 adopts 68 of them, resulting in 8,704 cores running at 1,710MHz during its boost state. The TFLOPS numbers are staggering. The card produces nearly 30 TFLOPS compared to 11.1 TFLOPS for the RTX 2080 Super - a GPU with which it shares a $699 price point.

GeForce RTX 3080 architecture floorplan - 68 out of a possible 84 SMs

Memory capacity is wholesome at 10GB and bandwidth is also handsome. On paper, it is leagues ahead of the equivalent solution from the last generation. In fact, it is clearly faster than the GeForce RTX 2080 Ti as it has the beating of it in every gaming metric that matters.

Benchmarks

Benchmarks from leading review sites show that 4K performance is head and shoulders above what is presently available. RTX 3080 is over 50 percent faster than RTX 2080 Super, and is the only card that can truly claim to be a 4K60 solution. It also offers excellent performance with raytracing and DLSS turned on, and if you are in the market for a graphics card that marries performance and relative value in one tidy package, look no further than the GeForce RTX 3080.

Scan Computers is proud to retail a wide range of NVIDIA GeForce RTX 3080 graphics cards. Head over to here to peruse our selection.

NVIDIA GeForce RTX 3080

NVIDIA Ampere 2nd Gen RTX 3080

Ampere Examined

GeForce RTX 3080

Turing to Ampere

Benchmarks