The 64-bit ARM architecture launched recently boasts of several technical advancements compared to its predecessor. Here are some benefits offered by ARMv8A compared to ARMv7A.
- AArch64 specific instructions set
- Backward-compatible with existing 32 AArch32 architecture
- 32 general purpose registers
- Instructions receptive to 32 as well as 64 bit arguments
- Supports 64-bit Address space
- Includes 128-bit registers
- Compatible with IEEE754 floating point arithmetic
- AES Encryption with SHA
- Revised exception handling instructions for exceptions in AArch64 state
- Support for DP floating-point execution
- Advanced SIMD support for complete IEEE 754 execution including Rounding-modes, Denorms and NaNhandling
AArch64 Instruction Set
Each instruction in A64 is defined with a fixed length of 32-bit. This is because the hardware has a decoding structure with contiguous bit fields for operands and immediate values. This not only simplifies the decoding table in the hardware but also provides JIT compilers with important acceleration techniques which are important to high performance applications. The independent decode also permits advanced branch prediction techniques too. The number of general-purpose registers has also increased. The virtual rename register pooling, which was introduced in the Cortex-A9 delivered an automated process of unrolling small loops, but did not provide complete benefit to the compiler to provide improved scheduling options. Hence implementing commonly used complex algorithms for software codes becomes a tough ask. The A64 ISA therefore presented thirty one 64-bit general purpose registers.
The ISA is also simplified now, than before Compared to the original RISC goals of the ARM ISA, the new version has removed the LDM/STM (load/store multiple) instructions that drove cost complexity in implementing an efficient processor’s memory system. The implementation complexity did not provide a relative benefit, which led to lesser conditional instructions. Also, the floating point unit is here to stay, at least in the near future. Hence, there would be no future checks aimed to check for its existence when providing the software with underlying hardware consistency.
The SIMD data engine’s instruction set has been revised in the new 64-bit world. It introduces double precision floating data processing to the existing SIMD capability with a simplified approach to address targeted algorithms aligned with the latest IEEE 754-2008 standard.
Advanced SIMD & FP Instruction Set
Advanced SIMD constitutes media and signal processing architecture that includes instructions targeted primarily at multimedia elements like audio, video, 3-D graphics, image, and even speech processing. Floating-point performs single-precision and double-precision FP operations.
Advanced SIMD and its associated implementations, along with the support software, are collectively known as NEON.
Memory Management Unit
The fundamentals of MMU remains the same in AArch64, wherein 64KB minimum page size is supported along with 4KB legacy page size.
A 32-bit application will support 4GB address space. Virtual address spaces from 232 to 248 bytes in size are supported from the top and bottom of the 64-bit address space.
Debug
The ARM Hardware Debug support can be segregated into 2 basic categories:
- Self-hosted debug for debug facilities used by the OS /hypervisor
- Halting debug for external “target debug” where session is run on separate host. Self-hosted debug is an intrinsic part of the exception model.
- Hardware watchpoints and breakpoints will generate exceptions on debug events
- Exceptions will be handled by a debug monitor alongside the OS Kernel or Hypervisor
- AArch32 self-hosted (“monitor”) capability will remain unchanged from ARMv7
AArch64 self-hosted debug is strongly integrated into AArch64 exception model
- Breakpoint and WatchpointAddresses grow to 64-bits
- Introduces an explicit hardware single step when debug monitor using AArch64
Halting Debug view is not backwards compatible with ARMv7
- External Debugger will have to be changed – for complete AArch32 operation
ARM Cortex-57 and ARM Cortex-53 processors
ARM CortexA-57 is actually an implementation of ARMv8, 64 bit architecture, which supports 1 to 4 cores per cluster for multiple clusters. With Level 1 cache support of 32KiB for Data and 48KiB for Instructions, these processors also feature low latency configurable L2 cache (upto 2MB). For each of the cores, DSP and NEON SIMD extensions are mandatory, thus driving 20-50% better performance in floating point calculations.
Chipsets
-
- AMD Opteron A1100
- Freescale QorIQ LS20xx
- Qualcomm Snapdragon 808, 810
Similarly, the Cortex-A53 CPU uses a simple pipeline in smaller configuration that targets efficient operating points for delivering high performance. It is one of the most power-efficient processors compared to any of its predecessors. A53 has several important features including Virtualization and a high memory reach of up to 256 TB. It is highly scalable, which means that you can even set up a CPU in combination with the A57.