Scan IT - Arm CPUs

Arm CPU technology has long been the processor of choice for smartphones and tablets and is commonplace in other embedded devices where low power consumption is a key consideration. However, in the last few years with evolving enterprise workloads such as HPC and AI, ARM CPUs have begun to be more attractive in a server environment. This shift has come about due to the different way an ARM-based processor works as opposed to a traditional x86 server CPU from Intel and AMD.

A New Approach for New Workflows

In brief, the reason why Arm processors have dominated the battery-powered and low-powered markets is that they are RISC-based rather than CISC-based in their design. CISC (complex instruction set computer) designs and are compatible with Intel’s original 8086 processor from the late 1970s - any CPU based on this original design is termed an x86 processor. Although a processor born in 70s may not seem very cutting edge this initial design has been updated and superseded many times to get to the modern 64-bit x86 Intel Xeon and AMD EPYC processors we see today. On the other hand Arm CPUs use a RISC (reduced instruction set computer) design - evolving, from the original Acorn and BBC Micro computers, over multiple generations since the 1980s - to the Arm CPUs we have today.

CISC-based Design	RISC-based Design
Complex Instructions	Simple Instructions
High number of Instructions (100-300)	Low number of Instructions (30-40)
Variable length or size of instructions	Fixed length or size of instructions
Multiple actions per instruction	One action per instruction
High power consumption (100W+)	Low power consumption (5-10W)
Complex hardware - Simple software	Simple hardware - Complex software

As RISC-based processors use simpler instruction sets, with lower volumes of single action instructions they ultimately consume less power, so are ideal for devices powered predominantly by batteries or ones that draw very little power. In x86 Xeon and EPYC CPUs, the complex instruction sets are processed across a number of cores - starting at 8 cores in entry models and up to 128 in models at the high end. This scaling of cores results in an increase in performance by improving the speed and power to handle more demanding computing workloads. By comparison, an Arm CPU employs many smaller, less sophisticated, low-power processors that each have multiple cores themselves so the compute tasks are actually shared across hundreds of micro processors. This method of increasing performance is sometimes referred to as scaling out, and can result in an Arm server delivering greater processing power while at the same time using less energy and requiring less cooling than a x86-based equivalent. It also allows for different types of cores in the CPU to handle different workloads.

This difference in approach is further enhanced when we consider server CPU generations are refreshed every two to three years, as opposed to smart phone and tablet CPUs being updated at least once a year to maintain demand in a very competitive market. This increased cycle of development has led to much more evolution in the Arm space than with x86 over the last decade resulting in ever smaller CPU architectures. Typically x86 CPUs have gone from 14nm to 10nm and recently to 7nm whereas Arm-based CPUs have gone through 14, 10, 7, 5 and 4nm designs in the same time. Each time the transistors are shrunk a greater number can fit into the CPU increasing its overall performance, making Arm CPUs even more attractive to the server market where there is an increase in software-defined technologies and applications that require smaller but more numerous tasks such as high performance computing, machine learning, deep learning and AI.

Tailor your cores to your workload

Although we may refer to Arm-based servers, unlike Intel or AMD, Arm is not the manufacturer of these CPUs. Brands such as Ampere, NVIDIA and Qualcomm license and develop Arm-based CPU using a combination of the below cores depending on which workloads they want to address with that processor. It may be that a specific core was originally designed for mobile use, has an particular advantage or useful characteristic when applied to a server application or workflow.

Cortex-A Cores

The ARM Cortex-A is a group of 32-bit and 64-bit processor cores optimised for supreme performance at the most cost-effective price.

Cortex-A715

Second-generation Armv9 “big” CPU for best-in-class efficient performance

• The CPU cluster workhorse across "big.LITTLE" configurations

• Targeted microarchitecture optimizations for 20% power efficiency improvements

Cortex-A710

First-generation Armv9 “big” CPU that offers a balance of performance and efficiency

• Addition of ARMv9 architecture features for enhanced performance and security

• 30% increase in energy efficiency compared to Cortex-A78

Cortex-A510

First-generation Armv9 high-efficiency “LITTLE” CPU

• Large performance increases for a highly efficient CPU

• Innovative microarchitecture upgrades

• Over 3x uplift in ML performance compared to Cortex-A55

Cortex-A78

The fourth generation high-performance CPU based on DynamIQ technology. The most efficient premium Cortex-A CPU

• Built for next generation consumer devices

•Enabling immersive experiences on new form factors and foldables

• Improving ML device responsiveness and capabilities such as face and speech recognition

Cortex-A78C

Providing market-specific solutions with advanced security features and large big-core configurations

• Performance for laptop class productivity and gaming on-the-go

• Advanced data and device security with Pointer Authentication

• Improved scalability with up to 8 big core only configuration and up to 8MB L3 cache

Cortex-A78AE

Arm’s most advanced processor designed for safety-critical applications

• Suited to complex automated driving and industrial autonomous systems

• Split-Lock capability with hybrid mode for flexible operations

• Enhanced support for ISO 26262 ASIL B and ASIL D safety requirements

Arm CPUs and 3XS Servers - the Perfect Partnership

The 3XS team of server architects have developed a range of rackmount solutions based on Ampere Altra processors - built with Arm architecture - to tackle high demand compute-bound workloads. Based on Gigabyte chassis the Ampere CPUs offer improvements in total cost of ownership (TCO) by running more efficiently per core and at a lower price per core. 1U and 2U servers feature a single socket Altra or Altra Max processor with up to 128 Armv8.2 cores, a 3.0GHz frequency and 128 PCIe 4.0 lanes, with support for up to 4TB RAM.


1U Server	2U Server	4U Server	4-node 2U Server
Up to 2x Ampere Altra or Ampere Altra Max CPUs	Up to 2x Ampere Altra or Ampere Altra Max CPUs	Ampere Altra or Ampere Altra Max CPU	Up to 2x Ampere Altra or Ampere Altra Max CPUs
-	Up to 2 GPUs	Up to 8 GPUs	Up to 4 GPUs
For edge computing and SDN	For AI training, AI inferencing and HPC	For AI training, AI inferencing and HPC	For HPC, HCI and cloud computing

Arm Processors

Flexible and adaptable processor solutions for servers

A New Approach for New Workflows

Tailor your cores to your workload

Cortex-A Cores

Cortex-A715

Cortex-A710

Cortex-A510

Cortex-A78

Cortex-A78C

Cortex-A78AE

Cortex-R Cores

Cortex-R82

Cortex-R52

Cortex-R52+

Cortex-R8

Cortex-R4

Ethos Cores

Ethos-N78

Ethos-U65

Ethos-U55

Neoverse Cores

Neoverse V1

Neoverse N2

Neoverse N1

Neoverse E1

Arm CPUs and 3XS Servers - the Perfect Partnership