Red Hat Research Quarterly

RISC-V extensions: what’s available and how to find them

Red Hat Research Quarterly

RISC-V extensions: what’s available and how to find them

about the author

Richard Jones

Richard Jones has been using Linux since the early 1990s, joining Red Hat in 2007. Richard is now a Senior Principal Software Engineer in Red Hat’s R&D Platform team.

Article featured in

Extensions available in RISC-V enable the customizations that make it ideal as a basis for open innovation. Here’s the extension situation as it stands today.


RISC-V is a new Instruction Set Architecture (ISA) that, over the next decade, will compete with x86-64 and ARM in all areas, from the lowest-end IoT devices all the way to huge servers. Unlike existing ISAs, RISC-V is a truly open source, permissionless architecture. You can start making chips using designs downloaded from GitHub without signing any legal agreement. If this sounds like Linux versus proprietary operating systems all over again, then you understand why Red Hat is interested. 

One unique aspect of RISC-V is that it is designed from the ground up for ISA extensions. As the name suggests, these add extra instructions to the ISA for implementing features like vector operations or accelerated encryption.

In this article, we’ll look at how extensions work at the lowest levels, how they are ratified and eventually standardized through RISC-V International (RVI), what extensions are out there, how extensions are grouped into profiles, and how you can discover which extensions are supported in your hardware. I will also cover QEMU support for extensions.

As this article is about extending the ISA, I will not cover non-ISA extensions in any detail.

Why extensions?

Broadly, you can extend and accelerate the capabilities of a CPU in two ways: add new instructions or implement a hardware accelerator. On other ISAs, accelerator peripherals exist for TCP offloading, AI, digital signal processing, encryption, and so on. Why would an extension be better—or, when is an extension better?

One way to think about this question is that extensions are part of the stream of instructions. They are, therefore, extremely low latency: there may be little to no overhead to using the instruction. This is in contrast to making an I/O request, where you might have to form a request packet and batch requests to get the best performance, and, of course, the request goes off the CPU and over a PCIe network.

They are, therefore, extremely low latency: there may be little to no overhead to using the instruction.

That said, extending the CPU is far less easy than plugging in a peripheral.

How extensions are encoded

RISC-V has a very regular instruction encoding. Because arbitrary extensions are allowed, RISC-V uses a variable-length encoding, with all instructions (currently) encoded either in 32 bits or in 16 bits for a compressed subset. For this article, I won’t talk much about compressed instructions— see the section below for more on that. And I won’t discuss variable-length encodings that are 48 bits and wider, since they are not used by any extensions today. 

With those assumptions in mind, we can assume the least significant 2 bits of each 32-bit instruction are always 1 1. The next 5 bits are the major opcode that control how the instruction is decoded:

This is not a comprehensive guide to decoding RISC-V instructions, as that is well covered in the RISC-V ISA spec, but some things are worth pointing out:

  • Some instructions have further opcode fields. For example, OP (0110011) has two extra opcode fields totaling another 10 bits. This major opcode is shared with the base ISA, the Multiply extension, the Zicond extension, the Packed SIMD extension, and more. There is no general way to tell if a particular opcode corresponds to the base ISA or an extension (without a large table).
  • Some major opcodes do correspond entirely to specific extensions. For example, AMO contains Atomic instructions (the A extension), and OP-V was originally reserved but is now used by the Vector extension.
  • Large sections of the opcode space are used for some fairly obscure features, like fused multiply-add.
  • custom-0 through custom-3 are for vendors to add their own extensions that they don’t intend to ratify. Essentially, custom-* is a free-for-all (except that the proposed 128-bit RV128I ISA will eventually use custom-2 and custom-3).

In this article, I will concentrate only on extensions that are non-custom, non-vendor-specific, and on the path to ratification by RVI.

It is most important to note that (non-custom) extensions are not distinct, separate parts of the decoding space. RVI appears to want to thread extensions into the gaps between existing base instructions. Major extensions like Vector that have their own major opcode (OP-V) also have instructions in LOAD-FP and elsewhere. I have heard extensions that have their own major opcode called green-field extensions. Smaller extensions use existing major opcodes, fit into the gaps, and are known as brown-field extensions. All else being equal, brown-field extensions have a higher chance of being accepted. This means that when proposing a new extension, you will need to consider existing extensions, which is good practice anyway.

Byte ordering of instructions

Instructions are always stored little endian, with the least significant byte first in memory. This applies even on the (theoretical) big endian RISC-V machine. Note that disassembly tools like objdump display the bytes as big endian. Thus an instruction like auipc appears in documentation and objdump output as:

But appears in memory as:

In other words, when decoding RISC-V instructions, you can tell much about how to decode the instruction and where the next instruction begins from the first byte. This is in stark contrast to x86, where simply determining the boundaries between instructions is a research project in itself!

The classic extensions

RISC-V was originally envisaged as the base ISA—RV32I, RV64I, or RV128I—surrounded by what I will call the classic extensions:

  •  I: Base integer instructions, add, subtract, jump, etc.
  •  M: Multiply and divide
  •  A: Atomic operations
  •  F: Single-precision floating-point arithmetic
  •  D: Double-precision floating-point arithmetic
  • C: Compressed instructions

Notice how some of these map fairly well to the major opcodes: for example, Atomic operations are implemented in the AMO opcode space.

We might add:

  • G: an early attempt to define a basic profile, G = IMAFD
  • E: an embedded subset with 16 base registers instead of the usual 32
  • Q: quad-precision floating point
  • H, U, and S: pseudo-extensions used in the misa CSR (Machine ISA Control and Status Register, more below) to refer to support for hypervisor, user, and supervisor modes
  • the X-prefix for custom extensions
  • the Z-prefix

It soon became obvious we were going to run out of single letters, so three prefixes were reserved for named extensions: S (e.g., Smmtt) for supervisor-mode extensions, Z (e.g., Zimop) for general extensions, and X for custom, vendor-specific extensions.

Zicsr and Zifencei were retrospectively detached from the base ISA after we realized that CSRs might not be present on very low-end hardware, and the FENCE.I instruction didn’t work very well.

One extension may also require another; for example, D requires F.

Naming extensions

There is a well-defined naming scheme describing which extensions are supported by hardware. I’ll describe below how you can find this out for your own hardware. At the time of writing in 2023, the vast majority of hardware can be described as RV64IMAFDCZicsr_Zifencei

Extension versions can also be encoded here (e.g., RV64I1p0 would be base ISA version 1.0).

The curious case of compressed encoding

If you were paying close attention to how RISC-V extensions are encoded, you will see that I assumed 32-bit, non-compressed instructions.

Current RISC-V implementations support a compressed 16-bit encoding for common instructions that is, of course, limited in the range of registers that may be accessed and the opcodes available. This is similar in spirit to armv7 Thumb instructions. The compressed extension was added very early on by the original RISC-V designers to help save the instruction cache and fetch bandwidth. Later, Linux distributions started to assume that the compressed extension is always available.

The downside is that compressed instructions consume three-fourths of the available opcode space (non-compressed, 32-bit instructions must have both least significant bits 1 1). Also, instructions are no longer automatically 4-byte aligned and may also cross cache lines and page boundaries, making decoding harder, albeit still much easier than complex architectures like x86.

High-end RISC-V server vendors are pushing back against supporting compressed instructions, arguing that their machines will have huge instruction caches (so code size is not critical). These vendors would prefer to use the opcode space in other ways and would like to have uniformly sized instructions. We have yet to see how this will pan out, but don’t be surprised if servers appear that don’t implement the C extension.

This is very much a concern for Red Hat. C is a special extension as these instructions appear frequently in binaries—as many as half of all instructions can be compressed—and trap-and-emulate would be impractical. Shipping two distro variants with and without compressed instructions is not attractive. Instead, we must decide whether to require it in hardware or ban it in software.

The new extensions

The classic extensions, in hindsight, don’t cover much of what is needed for a modern server. Since then, dozens of extensions have been proposed and ratified. This section highlights the most important extensions first (for servers). You can get a complete list of extensions and their status on the RISC-V wiki.

Vector, encryption, math

The most important extensions to the classic set are Vector (V, Zv*), Bit manipulation (Zb*), and Packed SIMD (P, Zbpno, Zp*). These very roughly correspond to MMX/SSE/AVX on x86, but RISC-V adds more flexibility and a different—and simpler—programming paradigm. We expect that when hardware with these instructions appears, they will be widely used in binaries (as happens on x86).

A whole article could be written about the vector extension (which is, in fact, a large collection of extensions).  Here are two: 

RISC-V also has two important sets of extensions for cryptography, called Scalar Crypto and Vector Crypto. Scalar Crypto was folded into the Bit manipulation extensions (Zbkb)—for example, the zip instruction in Zbkb is called out for being useful to implement SHA3. Vector Crypto (Zvknhb, Zvbc, Zvkn, Zvks, and more!) contains extended Vector instructions useful for Elliptic curve cryptography, various Message Authentication Codes, AES, AES-GCM, and many more.

Another important group of extensions extends floating-point support, adding Bfloat16 (Zfbfmin, Zvfbfmin, Zvfbfwma), common floating-point constants and many useful floating-point operations that are not present in the classic set (Zfa), and the “f-in-x” extensions (Zfinx, Zdinx, Zhinx, Zhinxmin) that allow floating-point and integer registers to be shared.

Virtualization

RISC-V support for running virtual machines (the Hypervisor extension) was demonstrated as far back as 2017 and ratified in 2021, but it is only expected to appear in hardware in 2024. This will be vital for RISC-V adoption on servers. H (hypervisor) is mostly a complicated addition to the privileged spec involving new modes and CSRs, but some new instructions were added. In particular, there are instructions to access memory while translating guest virtual addresses (useful for emulating I/O) and extra fencing instructions.

Interrupts, cache, and memory

Many extensions have been proposed and ratified that are beneficial for server-class operating systems. The most important are probably the ones that fix the interrupt architecture of the original design, which was notably inefficient—in particular, the Advanced Interrupt Architecture (Smaia, Ssaia) and the older Fast Interrupt specification (S*clic*). (Recall that extension names prefixed with S apply to supervisor mode). Also worth mentioning is Smrnmi, which fixes another issue with the base standard: after a Non-Maskable Interrupt, the interrupted program could not resume running. This adds a new mnret instruction to resume after NMI.

The original RISC-V design assumed a relaxed memory consistency similar to Arm, but some machines would prefer the stricter ordering found on x86 (for example, because you need to emulate or port code from x86). The Ztso extension changes load and store operations to use total store ordering.

Cache management operations (CMOs) are important in modern operating systems, and RISC-V defines a family of extensions for cache block operations. Zicbom are cache block management instructions for things like invalidating blocks of cache. Zicbop instructions are prefetch hints. Zicboz instructions store zeros over blocks of cache.

Safety and security

Hardening against attacks is a key concern for servers. One area being actively developed on all architectures is control flow integrity (CFI). RISC-V is ratifying two extensions for CFI. Zicfiss defines a shadow stack and provides new instructions to push and pop values there. Zicfilp defines places where code is allowed to branch to (especially through “computed gotos”), known as landing pads. These techniques are designed to prevent return-oriented programming (ROP) attacks after stack-smashing exploits.

Landing pads themselves are defined by further extensions—Zimop, Zcmop—that reserve some opcode space for may-be-operations (MOPs).

Miscellaneous

Other extensions relevant to servers:

  • Svinval can be used for selective TLB invalidation.
  • Zawrs adds instructions that make polling memory locations more efficient, typically used in spinlocks or when polling on a lockless queue. Zihintpause adds a new pause instruction that can also be used to reduce power consumption and memory traffic in spinlocks.
  • Zihintntl may be used to hint that memory accesses are non-temporal (i.e., do not need to be cached).
  • Zacas adds atomic compare and swap, omitted from the original Atomic instructions.
  • Zicond adds conditional instruction prefixes, similar to armv7 conditional operations.

Extension standardization and ratification

RVI has a process for taking non-custom extensions, shepherding them through standardization, and eventually ratifying them. Unlike proprietary ISAs, this process (and the arguing!) happens in the open, on GitHub pages, in mailing lists, and on open video calls. Extensions on their way through ratification are listed on the RISC-V wiki, along with links to the process and a description of the lifecycle extensions go through before ratification.

Profiles

Software generally needs some kind of baseline target to run. While some extensions can be detected at runtime and different code paths chosen, much software will be written that expects a basic set of extensions to exist.

For this reason, RVI defines a set of profiles named after the year they were defined and grouped into two families. Thus, at the time of writing, the latest profile is RVA22, and RVA23 is in development. A stands for the “Application processors running rich operating systems” family (i.e., servers); 22/23 are the year codes.

RVA22 includes all the classic extensions and a smattering of older system extensions. More notable is what it omits. It predates the ratification of the Vector extension, so this is only optional, and Vector Crypto and Packed SIMD are also missing. See the full RVA22 profile for more details. 

We expect this to change, as the current profiles don’t reflect the many branches of the RISC-V ecosystem. In the future, we expect to see stricter requirements for backward compatibility, only fully ratified extensions allowed, and unused opcode space forced to trap (allowing some forward compatibility through trap-and-emulate).

Discovering extensions—Where’s my CPUID? 

x86 has CPUID, a comprehensive method to detect at runtime which features the processor supports and many other aspects of the CPU, like cache sizes and so forth. There is nothing this comprehensive available in RISC-V at the moment.

For RISC-V, there are three ways to determine which extensions are available in the hardware. The oldest mechanism, now mostly deprecated, is to read the misa CSR. This register lets you read the machine XLEN (i.e., the base ISA: RV32I, RV64I, or RV128I), but your code won’t run unless it uses the right instructions in the first place, so you must know this already. It also contains 26 bits corresponding to the 26 letters of the alphabet, anticipating up to 26 extensions (minus reserved letters). As discussed before, believing there would be only 26 extensions was naive. Another problem with misa is that you cannot read versions of extensions, but the two other methods allow you to get all extensions and (in theory) their versions.

The second method is to use information from device tree (DT). The deprecated riscv,isa field contains a full extension string with optional versions. Linux ignores the versions and contains workarounds for buggy strings in existing implementations. The replacement is riscv,isa-base and riscv,isa-extensions, which has a cleaner implementation.

The third method is to use information from the ACPI RISC-V Hart Capabilities Table (RHCT). This encodes a full extension string with optional versions.

All these methods are available directly only to code running in Machine or Supervisor modes. To pass the information up to userspace, Linux provides /proc/cpuinfo and a new system call, riscv_hwprobe. However, the information available through these is very sparse at the moment, even relative to what is available from the hardware.

If you know the machine is using DT, then in Linux the riscv,isa field can be read out directly from /sys. Here is an example from a QEMU guest:

$ cat '/sys/firmware/devicetree/base/cpus/cpu@0/riscv,isa'
rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintntl_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu

How QEMU implements extensions

QEMU, a virtual machine emulator and hypervisor, can emulate RISC-V in software. This is useful when you don’t have RISC-V hardware or don’t have hardware that supports a particular extension. The software emulation in QEMU is called the Tiny Code Generator (TCG).

TCG works by translating basic blocks as they are encountered. The translated code is stored in a QEMU TranslationBlock (TB) structure and referenced through a hash table of (CPU state, physical address). TBs persist so that code doesn’t need to be retranslated, but it can be invalidated by things such as writes happening to the same code page.

TCG defines a set of basic operations like integer adds, loads, stores, labels, branches, and so on. You can recognize these when you see tcg_gen_* called in QEMU code. For example, tcg_gen_qemu_ld_i64 would be called when translating a block of code and would generate a TCG instruction to do a 64-bit load and append it to the list of translated instructions. 

However, anything complicated (such as CSRs or vector instructions) is translated into a call to a helper function. You will see helper functions defined in QEMU using the macro HELPER(<name>). When translating, a call to the helper would be generated using gen_helper_<name>. Since most RISC-V extensions are complicated, they are almost always implemented as a set of helpers.

A file in the QEMU source target/riscv/insn32.decode describes how instruction bit patterns are decoded. Extensions must list their new instructions here.

The file target/riscv/cpu.c contains two tables listing ISA extensions, their names, and versions. This is a very useful reference for finding out which extensions have been implemented in QEMU.

At the time of writing, QEMU supports these extensions, making it probably the most capable RISC-V platform:

  • The base RV32I and RV64I ISAs
  • The classic extensions: M A F D
  • Compressed instructions: C, Zca, Zcb, Zcf, Zcd, Zce, Zcmp, Zcmt
  • The embedded extension: E
  • The hypervisor extension: H
  • User and Supervisor modes (but note, not Machine mode): U S
  • Dynamic languages: J
  • Cache management (partial): Zicbom, Zicboz
  • Conditional ops: Zicond
  • Read and write CSRs: Zicsr
  • FENCE.I instruction: Zifencei
  • Pause hint: Zihintpause
  • Wait on reservation set: Zawrs
  • Additional scalar FP: Zfa
  • Bfloat16 (partial): Zfbfmin
  • Half-width FP: Zfh, Zfhmin
  • FP using integer regs: Zfinx, Zdinx, Zhinx
  • Bit manipulation: Zba, Zbb, Zbc, Zbs
  • Crypto scalar: Zbkb, Zbkc, Zbkx, Zk*
  • Vector (mostly complete): V, Zv*
  • Advanced Interrupt Architecture: Smaia, Ssaia
  • State enable: Smstateen
  • Count overflow & filtering: Sscofpmf
  • Time compare: Sstc
  • Hardware update of PTE A/D bits: Svadu
  • Fast TLB invalidation: Svinval
  • NAPOT pages: Svnapot
  • Page-based memory types: Svpbmt
  • T-HEAD multiple custom extensions
  • Ventana custom extensions for conditional ops

Emulation of RISC-V extensions on RISC-V

RISC-V extensions may also be emulated on RISC-V hardware using trap-and-emulate. There are two broad approaches taken:

  • Modify the OpenSBI illegal instruction handler (lib/sbi/sbi_illegal_insn.c) to catch the illegal instruction and emulate it.
  • Modify the Linux kernel illegal instruction handler (arch/riscv/kernel/traps.c).

The first method was used to implement a mostly complete emulation of the Hypervisor extension. The second method was used to implement the Vector extension for machines that lack it.

Modifying OpenSBI has some downsides you should be aware of:

  • On some machines, SBI is part of the platform firmware and might not be open source or user-replaceable.
  • M-mode does not use paging, so the emulation must do its own page table walk if the extension uses virtual addresses.
  • M-mode traps to SBI have extra overhead in hardware.
  • There are also security and operational concerns as M-mode has complete access to the hardware, but a bug in the operating system might be limited and recoverable (e.g., by a software watchdog).

Modifying Linux has the downside that the emulation is only available for Linux and won’t work for other operating systems, nor for the code that runs before Linux, such as SBL, SBI, u-boot, and EDK2.

RISC-V in research

RISC-V-based microarchitectures are an important part of all FPGA-based research projects at the Red Hat Collaboratory at Boston University, in part because of its support for custom extensions. Visit these project pages to learn more:

You can learn more about the role of RISC-V in research on open hardware in the RHRQ articles  RISC-V in FPGAs: benefits and opportunities (RHRQ 4:1) and Fostering open innovation in hardware (RHRQ 2:2).

SHARE THIS ARTICLE

More like this