Arm Processors

Flexible and adaptable processor solutions for servers

 

Arm CPU technology has long been the processor of choice for smartphones and tablets and is commonplace in other embedded devices where low power consumption is a key consideration. However, in the last few years with evolving enterprise workloads such as HPC and AI, ARM CPUs have begun to be more attractive in a server environment. This shift has come about due to the different way an ARM-based processor works as opposed to a traditional x86 server CPU from Intel and AMD.

A New Approach for New Workflows

In brief, the reason why Arm processors have dominated the battery-powered and low-powered markets is that they are RISC-based rather than CISC-based in their design. CISC (complex instruction set computer) designs and are compatible with Intel’s original 8086 processor from the late 1970s - any CPU based on this original design is termed an x86 processor. Although a processor born in 70s may not seem very cutting edge this initial design has been updated and superseded many times to get to the modern 64-bit x86 Intel Xeon and AMD EPYC processors we see today. On the other hand Arm CPUs use a RISC (reduced instruction set computer) design - evolving, from the original Acorn and BBC Micro computers, over multiple generations since the 1980s - to the Arm CPUs we have today.

CISC-based Design RISC-based Design
Complex Instructions Simple Instructions
High number of Instructions (100-300) Low number of Instructions (30-40)
Variable length or size of instructions Fixed length or size of instructions
Multiple actions per instruction One action per instruction
High power consumption (100W+) Low power consumption (5-10W)
Complex hardware - Simple software Simple hardware - Complex software

As RISC-based processors use simpler instruction sets, with lower volumes of single action instructions they ultimately consume less power, so are ideal for devices powered predominantly by batteries or ones that draw very little power. In x86 Xeon and EPYC CPUs, the complex instruction sets are processed across a number of cores - starting at 8 cores in entry models and up to 128 in models at the high end. This scaling of cores results in an increase in performance by improving the speed and power to handle more demanding computing workloads. By comparison, an Arm CPU employs many smaller, less sophisticated, low-power processors that each have multiple cores themselves so the compute tasks are actually shared across hundreds of micro processors. This method of increasing performance is sometimes referred to as scaling out, and can result in an Arm server delivering greater processing power while at the same time using less energy and requiring less cooling than a x86-based equivalent. It also allows for different types of cores in the CPU to handle different workloads.

This difference in approach is further enhanced when we consider server CPU generations are refreshed every two to three years, as opposed to smart phone and tablet CPUs being updated at least once a year to maintain demand in a very competitive market. This increased cycle of development has led to much more evolution in the Arm space than with x86 over the last decade resulting in ever smaller CPU architectures. Typically x86 CPUs have gone from 14nm to 10nm and recently to 7nm whereas Arm-based CPUs have gone through 14, 10, 7, 5 and 4nm designs in the same time. Each time the transistors are shrunk a greater number can fit into the CPU increasing its overall performance, making Arm CPUs even more attractive to the server market where there is an increase in software-defined technologies and applications that require smaller but more numerous tasks such as high performance computing, machine learning, deep learning and AI.

Tailor your cores to your workload

Although we may refer to Arm-based servers, unlike Intel or AMD, Arm is not the manufacturer of these CPUs. Brands such as Ampere, NVIDIA and Qualcomm license and develop Arm-based CPU using a combination of the below cores depending on which workloads they want to address with that processor. It may be that a specific core was originally designed for mobile use, has an particular advantage or useful characteristic when applied to a server application or workflow.

Cortex-A Cores

The ARM Cortex-A is a group of 32-bit and 64-bit processor cores optimised for supreme performance at the most cost-effective price.

Cortex-A715

Second-generation Armv9 “big” CPU for best-in-class efficient performance

• The CPU cluster workhorse across "big.LITTLE" configurations

• Targeted microarchitecture optimizations for 20% power efficiency improvements

Cortex-A710

First-generation Armv9 “big” CPU that offers a balance of performance and efficiency

• Addition of ARMv9 architecture features for enhanced performance and security

• 30% increase in energy efficiency compared to Cortex-A78

Cortex-A510

First-generation Armv9 high-efficiency “LITTLE” CPU

• Large performance increases for a highly efficient CPU

• Innovative microarchitecture upgrades

• Over 3x uplift in ML performance compared to Cortex-A55

Cortex-A78

The fourth generation high-performance CPU based on DynamIQ technology. The most efficient premium Cortex-A CPU

• Built for next generation consumer devices

•Enabling immersive experiences on new form factors and foldables

• Improving ML device responsiveness and capabilities such as face and speech recognition

Cortex-A78C

Providing market-specific solutions with advanced security features and large big-core configurations

• Performance for laptop class productivity and gaming on-the-go

• Advanced data and device security with Pointer Authentication

• Improved scalability with up to 8 big core only configuration and up to 8MB L3 cache

Cortex-A78AE

Arm’s most advanced processor designed for safety-critical applications

• Suited to complex automated driving and industrial autonomous systems

• Split-Lock capability with hybrid mode for flexible operations

• Enhanced support for ISO 26262 ASIL B and ASIL D safety requirements

Cortex-R Cores

The Arm Cortex-R is a group of 32-bit and 64-bit processor cores, optimised for real time and safety-critical applications.

Cortex-R82

Highest performance real-time processor

• Offers efficient, high-performance compute for complex storage applications

• Supports Arm Neon technology for ML acceleration

• Implements MMU for rich OS support

Cortex-R52

Advanced processor for functional safety

• Enables advanced safety features for a range of automotive applications

• Software separation protects safety-critical code

• High-performance multicore clusters deliver real-time responsiveness

Cortex-R52+

Real-time virtualization support for functional safety

• Software separation enables integration of multiple operating systems while protecting safety-critical code

• High-performance multicore clusters deliver real-time responsiveness

Cortex-R8

High performance processor suited for storage controllers and modems

• Offers low latency

• Configurable ports support flexible design options

• Delivers the responsive power needed for high-performance mass storage applications

Cortex-R4

Smallest, real-time performance processor

• Offers excellent energy efficiency and cost effectiveness

• Prioritize reliability and error management with built-in error handling

• Ideal for embedded applications including automobiles and cameras

Ethos Cores

Ethos is a family of neural processors with the highest performance for machine learning inference.

Ethos-N78

Scalable and efficient second-generation ML inference processor

• 2x faster inference with 40% lower bandwidth, 25% increased efficiency

• Multiple markets from 1 to 10 TOP/s and up to 90 unique configurations

• Develop once, deploy anywhere with online and offline compilation

Ethos-U65

Powering innovation in a new world of AI devices at the edge and endpoint

• Delivers 1.0 TOP/s ML performance in about 0.6 mm2

• Partner configurable from 256 to 512 8-bit MACs

• Unified toolchain supports Cortex-M and Cortex-A based systems

Ethos-U55

Configurable and efficient embedded ML inference

• Delivers up to 0.5 TOP/s, a 480x ML up lift and 90% energy reduction

• Partner configurable from 32 to 256 8-bit MACs in around 0.1mm2

• Rapid development with a single tool chain for Cortex-M and Ethos-U

Neoverse Cores

The Neoverse series of processors delivers maximum per core performance for systems running demanding compute- and memory-intensive applications such as cloud and virtualised infrastructures.

Neoverse V1

A performance-first tier targeting HPC, HPC in the Cloud and AI or ML-accelerated applications

• Arm’s first SVE implementation for the HPC market, with 2x floating and 4x ML uplift over Neoverse N1

• Expands market-leading performance with 50% IPC uplift over Neoverse N1

Neoverse N2

Leading performance efficiency for cloud-to-edge infrastructure

• Market leading scalability and versatility

• Significant 40% IPC performance improvement over Neoverse N1

• Armv9 introduces key features for performance (SVE2) and security (MTE)

Neoverse N1

Accelerating the transformation to a scalable cloud to edge infrastructure

• Revolutionary compute performance

• Platform features specific to Infrastructure

• Designed for extreme range of scale and diversity of compute

Neoverse E1

Empowering the Infrastructure to meet next generation throughput demands

• Intelligent design for highly efficient throughput

• Fully leverages diverse Arm software ecosystem

• Highly scalable throughput for edge to core data transport

Arm CPUs and 3XS Servers - the Perfect Partnership

The 3XS team of server architects have developed a range of rackmount solutions based on Ampere Altra processors - built with Arm architecture - to tackle high demand compute-bound workloads. Based on Gigabyte chassis the Ampere CPUs offer improvements in total cost of ownership (TCO) by running more efficiently per core and at a lower price per core. 1U and 2U servers feature a single socket Altra or Altra Max processor with up to 128 Armv8.2 cores, a 3.0GHz frequency and 128 PCIe 4.0 lanes, with support for up to 4TB RAM.

1U Server 2U Server 4U Server 4-node 2U Server
Up to 2x Ampere Altra
or Ampere Altra Max CPUs
Up to 2x Ampere Altra
or Ampere Altra Max CPUs
Ampere Altra or Ampere Altra Max CPU Up to 2x Ampere Altra
or Ampere Altra Max CPUs
- Up to 2 GPUs Up to 8 GPUs Up to 4 GPUs
For edge computing and SDN For AI training, AI inferencing and HPC For AI training, AI inferencing and HPC For HPC, HCI and cloud computing