RISC-V extensions: what’s available and how to find them

Extensions available in RISC-V enable the customizations that make it ideal as a basis for open innovation. Here’s the extension situation as it stands today.

RISC-V is a new Instruction Set Architecture (ISA) that, over the next decade, will compete with x86-64 and ARM in all areas, from the lowest-end IoT devices all the way to huge servers. Unlike existing ISAs, RISC-V is a truly open source, permissionless architecture. You can start making chips using designs downloaded from GitHub without signing any legal agreement. If this sounds like Linux versus proprietary operating systems all over again, then you understand why Red Hat is interested.

One unique aspect of RISC-V is that it is designed from the ground up for ISA extensions. As the name suggests, these add extra instructions to the ISA for implementing features like vector operations or accelerated encryption.

In this article, we’ll look at how extensions work at the lowest levels, how they are ratified and eventually standardized through RISC-V International (RVI), what extensions are out there, how extensions are grouped into profiles, and how you can discover which extensions are supported in your hardware. I will also cover QEMU support for extensions.

As this article is about extending the ISA, I will not cover non-ISA extensions in any detail.

Why extensions?

Broadly, you can extend and accelerate the capabilities of a CPU in two ways: add new instructions or implement a hardware accelerator. On other ISAs, accelerator peripherals exist for TCP offloading, AI, digital signal processing, encryption, and so on. Why would an extension be better—or, when is an extension better?

One way to think about this question is that extensions are part of the stream of instructions. They are, therefore, extremely low latency: there may be little to no overhead to using the instruction. This is in contrast to making an I/O request, where you might have to form a request packet and batch requests to get the best performance, and, of course, the request goes off the CPU and over a PCIe network.

They are, therefore, extremely low latency: there may be little to no overhead to using the instruction.

That said, extending the CPU is far less easy than plugging in a peripheral.

How extensions are encoded

RISC-V has a very regular instruction encoding. Because arbitrary extensions are allowed, RISC-V uses a variable-length encoding, with all instructions (currently) encoded either in 32 bits or in 16 bits for a compressed subset. For this article, I won’t talk much about compressed instructions— see the section below for more on that. And I won’t discuss variable-length encodings that are 48 bits and wider, since they are not used by any extensions today.

With those assumptions in mind, we can assume the least significant 2 bits of each 32-bit instruction are always 1 1. The next 5 bits are the major opcode that control how the instruction is decoded:

This is not a comprehensive guide to decoding RISC-V instructions, as that is well covered in the RISC-V ISA spec, but some things are worth pointing out:

Some instructions have further opcode fields. For example, OP (0110011) has two extra opcode fields totaling another 10 bits. This major opcode is shared with the base ISA, the Multiply extension, the Zicond extension, the Packed SIMD extension, and more. There is no general way to tell if a particular opcode corresponds to the base ISA or an extension (without a large table).
Some major opcodes do correspond entirely to specific extensions. For example, AMO contains Atomic instructions (the A extension), and OP-V was originally reserved but is now used by the Vector extension.
Large sections of the opcode space are used for some fairly obscure features, like fused multiply-add.
custom-0 through custom-3 are for vendors to add their own extensions that they don’t intend to ratify. Essentially, custom-* is a free-for-all (except that the proposed 128-bit RV128I ISA will eventually use custom-2 and custom-3).

In this article, I will concentrate only on extensions that are non-custom, non-vendor-specific, and on the path to ratification by RVI.

It is most important to note that (non-custom) extensions are not distinct, separate parts of the decoding space. RVI appears to want to thread extensions into the gaps between existing base instructions. Major extensions like Vector that have their own major opcode (OP-V) also have instructions in LOAD-FP and elsewhere. I have heard extensions that have their own major opcode called green-field extensions. Smaller extensions use existing major opcodes, fit into the gaps, and are known as brown-field extensions. All else being equal, brown-field extensions have a higher chance of being accepted. This means that when proposing a new extension, you will need to consider existing extensions, which is good practice anyway.

Byte ordering of instructions

Instructions are always stored little endian, with the least significant byte first in memory. This applies even on the (theoretical) big endian RISC-V machine. Note that disassembly tools like objdump display the bytes as big endian. Thus an instruction like auipc appears in documentation and objdump output as:

But appears in memory as:

In other words, when decoding RISC-V instructions, you can tell much about how to decode the instruction and where the next instruction begins from the first byte. This is in stark contrast to x86, where simply determining the boundaries between instructions is a research project in itself!

The classic extensions

RISC-V was originally envisaged as the base ISA—RV32I, RV64I, or RV128I—surrounded by what I will call the classic extensions:

I: Base integer instructions, add, subtract, jump, etc.
M: Multiply and divide
A: Atomic operations
F: Single-precision floating-point arithmetic
D: Double-precision floating-point arithmetic
C: Compressed instructions

Notice how some of these map fairly well to the major opcodes: for example, Atomic operations are implemented in the AMO opcode space.

We might add:

G: an early attempt to define a basic profile, G = IMAFD
E: an embedded subset with 16 base registers instead of the usual 32
Q: quad-precision floating point
H, U, and S: pseudo-extensions used in the misa CSR (Machine ISA Control and Status Register, more below) to refer to support for hypervisor, user, and supervisor modes
the X-prefix for custom extensions
the Z-prefix

It soon became obvious we were going to run out of single letters, so three prefixes were reserved for named extensions: S (e.g., Smmtt) for supervisor-mode extensions, Z (e.g., Zimop) for general extensions, and X for custom, vendor-specific extensions.

Zicsr and Zifencei were retrospectively detached from the base ISA after we realized that CSRs might not be present on very low-end hardware, and the FENCE.I instruction didn’t work very well.

One extension may also require another; for example, D requires F.

Naming extensions

There is a well-defined naming scheme describing which extensions are supported by hardware. I’ll describe below how you can find this out for your own hardware. At the time of writing in 2023, the vast majority of hardware can be described as RV64IMAFDCZicsr_Zifencei

Extension versions can also be encoded here (e.g., RV64I1p0 would be base ISA version 1.0).

The curious case of compressed encoding

If you were paying close attention to how RISC-V extensions are encoded, you will see that I assumed 32-bit, non-compressed instructions.

Current RISC-V implementations support a compressed 16-bit encoding for common instructions that is, of course, limited in the range of registers that may be accessed and the opcodes available. This is similar in spirit to armv7 Thumb instructions. The compressed extension was added very early on by the original RISC-V designers to help save the instruction cache and fetch bandwidth. Later, Linux distributions started to assume that the compressed extension is always available.

The downside is that compressed instructions consume three-fourths of the available opcode space (non-compressed, 32-bit instructions must have both least significant bits 1 1). Also, instructions are no longer automatically 4-byte aligned and may also cross cache lines and page boundaries, making decoding harder, albeit still much easier than complex architectures like x86.

High-end RISC-V server vendors are pushing back against supporting compressed instructions, arguing that their machines will have huge instruction caches (so code size is not critical). These vendors would prefer to use the opcode space in other ways and would like to have uniformly sized instructions. We have yet to see how this will pan out, but don’t be surprised if servers appear that don’t implement the C extension.

This is very much a concern for Red Hat. C is a special extension as these instructions appear frequently in binaries—as many as half of all instructions can be compressed—and trap-and-emulate would be impractical. Shipping two distro variants with and without compressed instructions is not attractive. Instead, we must decide whether to require it in hardware or ban it in software.

The new extensions

The classic extensions, in hindsight, don’t cover much of what is needed for a modern server. Since then, dozens of extensions have been proposed and ratified. This section highlights the most important extensions first (for servers). You can get a complete list of extensions and their status on the RISC-V wiki.

Vector, encryption, math

The most important extensions to the classic set are Vector (V, Zv*), Bit manipulation (Zb*), and Packed SIMD (P, Zbpno, Zp*). These very roughly correspond to MMX/SSE/AVX on x86, but RISC-V adds more flexibility and a different—and simpler—programming paradigm. We expect that when hardware with these instructions appears, they will be widely used in binaries (as happens on x86).

A whole article could be written about the vector extension (which is, in fact, a large collection of extensions). Here are two:

RISC-V also has two important sets of extensions for cryptography, called Scalar Crypto and Vector Crypto. Scalar Crypto was folded into the Bit manipulation extensions (Zbkb)—for example, the zip instruction in Zbkb is called out for being useful to implement SHA3. Vector Crypto (Zvknhb, Zvbc, Zvkn, Zvks, and more!) contains extended Vector instructions useful for Elliptic curve cryptography, various Message Authentication Codes, AES, AES-GCM, and many more.

Another important group of extensions extends floating-point support, adding Bfloat16 (Zfbfmin, Zvfbfmin, Zvfbfwma), common floating-point constants and many useful floating-point operations that are not present in the classic set (Zfa), and the “f-in-x” extensions (Zfinx, Zdinx, Zhinx, Zhinxmin) that allow floating-point and integer registers to be shared.

Virtualization

RISC-V support for running virtual machines (the Hypervisor extension) was demonstrated as far back as 2017 and ratified in 2021, but it is only expected to appear in hardware in 2024. This will be vital for RISC-V adoption on servers. H (hypervisor) is mostly a complicated addition to the privileged spec involving new modes and CSRs, but some new instructions were added. In particular, there are instructions to access memory while translating guest virtual addresses (useful for emulating I/O) and extra fencing instructions.

Interrupts, cache, and memory

Many extensions have been proposed and ratified that are beneficial for server-class operating systems. The most important are probably the ones that fix the interrupt architecture of the original design, which was notably inefficient—in particular, the Advanced Interrupt Architecture (Smaia, Ssaia) and the older Fast Interrupt specification (S*clic*). (Recall that extension names prefixed with S apply to supervisor mode). Also worth mentioning is Smrnmi, which fixes another issue with the base standard: after a Non-Maskable Interrupt, the interrupted program could not resume running. This adds a new mnret instruction to resume after NMI.

The original RISC-V design assumed a relaxed memory consistency similar to Arm, but some machines would prefer the stricter ordering found on x86 (for example, because you need to emulate or port code from x86). The Ztso extension changes load and store operations to use total store ordering.

Cache management operations (CMOs) are important in modern operating systems, and RISC-V defines a family of extensions for cache block operations. Zicbom are cache block management instructions for things like invalidating blocks of cache. Zicbop instructions are prefetch hints. Zicboz instructions store zeros over blocks of cache.

Safety and security

Hardening against attacks is a key concern for servers. One area being actively developed on all architectures is control flow integrity (CFI). RISC-V is ratifying two extensions for CFI. Zicfiss defines a shadow stack and provides new instructions to push and pop values there. Zicfilp defines places where code is allowed to branch to (especially through “computed gotos”), known as landing pads. These techniques are designed to prevent return-oriented programming (ROP) attacks after stack-smashing exploits.

Landing pads themselves are defined by further extensions—Zimop, Zcmop—that reserve some opcode space for may-be-operations (MOPs).

Miscellaneous

Other extensions relevant to servers:

Svinval can be used for selective TLB invalidation.
Zawrs adds instructions that make polling memory locations more efficient, typically used in spinlocks or when polling on a lockless queue. Zihintpause adds a new pause instruction that can also be used to reduce power consumption and memory traffic in spinlocks.
Zihintntl may be used to hint that memory accesses are non-temporal (i.e., do not need to be cached).
Zacas adds atomic compare and swap, omitted from the original Atomic instructions.
Zicond adds conditional instruction prefixes, similar to armv7 conditional operations.

Extension standardization and ratification

RVI has a process for taking non-custom extensions, shepherding them through standardization, and eventually ratifying them. Unlike proprietary ISAs, this process (and the arguing!) happens in the open, on GitHub pages, in mailing lists, and on open video calls. Extensions on their way through ratification are listed on the RISC-V wiki, along with links to the process and a description of the lifecycle extensions go through before ratification.

Profiles

Software generally needs some kind of baseline target to run. While some extensions can be detected at runtime and different code paths chosen, much software will be written that expects a basic set of extensions to exist.

For this reason, RVI defines a set of profiles named after the year they were defined and grouped into two families. Thus, at the time of writing, the latest profile is RVA22, and RVA23 is in development. A stands for the “Application processors running rich operating systems” family (i.e., servers); 22/23 are the year codes.

RVA22 includes all the classic extensions and a smattering of older system extensions. More notable is what it omits. It predates the ratification of the Vector extension, so this is only optional, and Vector Crypto and Packed SIMD are also missing. See the full RVA22 profile for more details.

We expect this to change, as the current profiles don’t reflect the many branches of the RISC-V ecosystem. In the future, we expect to see stricter requirements for backward compatibility, only fully ratified extensions allowed, and unused opcode space forced to trap (allowing some forward compatibility through trap-and-emulate).

Discovering extensions—Where’s my CPUID?

x86 has CPUID, a comprehensive method to detect at runtime which features the processor supports and many other aspects of the CPU, like cache sizes and so forth. There is nothing this comprehensive available in RISC-V at the moment.

For RISC-V, there are three ways to determine which extensions are available in the hardware. The oldest mechanism, now mostly deprecated, is to read the misa CSR. This register lets you read the machine XLEN (i.e., the base ISA: RV32I, RV64I, or RV128I), but your code won’t run unless it uses the right instructions in the first place, so you must know this already. It also contains 26 bits corresponding to the 26 letters of the alphabet, anticipating up to 26 extensions (minus reserved letters). As discussed before, believing there would be only 26 extensions was naive. Another problem with misa is that you cannot read versions of extensions, but the two other methods allow you to get all extensions and (in theory) their versions.

The second method is to use information from device tree (DT). The deprecated riscv,isa field contains a full extension string with optional versions. Linux ignores the versions and contains workarounds for buggy strings in existing implementations. The replacement is riscv,isa-base and riscv,isa-extensions, which has a cleaner implementation.

The third method is to use information from the ACPI RISC-V Hart Capabilities Table (RHCT). This encodes a full extension string with optional versions.

All these methods are available directly only to code running in Machine or Supervisor modes. To pass the information up to userspace, Linux provides /proc/cpuinfo and a new system call, riscv_hwprobe. However, the information available through these is very sparse at the moment, even relative to what is available from the hardware.

If you know the machine is using DT, then in Linux the riscv,isa field can be read out directly from /sys. Here is an example from a QEMU guest:

$ cat '/sys/firmware/devicetree/base/cpus/cpu@0/riscv,isa'
rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintntl_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu

How QEMU implements extensions

QEMU, a virtual machine emulator and hypervisor, can emulate RISC-V in software. This is useful when you don’t have RISC-V hardware or don’t have hardware that supports a particular extension. The software emulation in QEMU is called the Tiny Code Generator (TCG).

TCG works by translating basic blocks as they are encountered. The translated code is stored in a QEMU TranslationBlock (TB) structure and referenced through a hash table of (CPU state, physical address). TBs persist so that code doesn’t need to be retranslated, but it can be invalidated by things such as writes happening to the same code page.

TCG defines a set of basic operations like integer adds, loads, stores, labels, branches, and so on. You can recognize these when you see tcg_gen_* called in QEMU code. For example, tcg_gen_qemu_ld_i64 would be called when translating a block of code and would generate a TCG instruction to do a 64-bit load and append it to the list of translated instructions.

However, anything complicated (such as CSRs or vector instructions) is translated into a call to a helper function. You will see helper functions defined in QEMU using the macro HELPER(<name>). When translating, a call to the helper would be generated using gen_helper_<name>. Since most RISC-V extensions are complicated, they are almost always implemented as a set of helpers.

A file in the QEMU source target/riscv/insn32.decode describes how instruction bit patterns are decoded. Extensions must list their new instructions here.

The file target/riscv/cpu.c contains two tables listing ISA extensions, their names, and versions. This is a very useful reference for finding out which extensions have been implemented in QEMU.

At the time of writing, QEMU supports these extensions, making it probably the most capable RISC-V platform:

The base RV32I and RV64I ISAs
The classic extensions: M A F D
Compressed instructions: C, Zca, Zcb, Zcf, Zcd, Zce, Zcmp, Zcmt
The embedded extension: E
The hypervisor extension: H
User and Supervisor modes (but note, not Machine mode): U S
Dynamic languages: J
Cache management (partial): Zicbom, Zicboz
Conditional ops: Zicond
Read and write CSRs: Zicsr
FENCE.I instruction: Zifencei
Pause hint: Zihintpause
Wait on reservation set: Zawrs
Additional scalar FP: Zfa
Bfloat16 (partial): Zfbfmin
Half-width FP: Zfh, Zfhmin
FP using integer regs: Zfinx, Zdinx, Zhinx
Bit manipulation: Zba, Zbb, Zbc, Zbs
Crypto scalar: Zbkb, Zbkc, Zbkx, Zk*
Vector (mostly complete): V, Zv*
Advanced Interrupt Architecture: Smaia, Ssaia
State enable: Smstateen
Count overflow & filtering: Sscofpmf
Time compare: Sstc
Hardware update of PTE A/D bits: Svadu
Fast TLB invalidation: Svinval
NAPOT pages: Svnapot
Page-based memory types: Svpbmt
T-HEAD multiple custom extensions
Ventana custom extensions for conditional ops

Emulation of RISC-V extensions on RISC-V

RISC-V extensions may also be emulated on RISC-V hardware using trap-and-emulate. There are two broad approaches taken:

Modify the OpenSBI illegal instruction handler (lib/sbi/sbi_illegal_insn.c) to catch the illegal instruction and emulate it.
Modify the Linux kernel illegal instruction handler (arch/riscv/kernel/traps.c).

The first method was used to implement a mostly complete emulation of the Hypervisor extension. The second method was used to implement the Vector extension for machines that lack it.

Modifying OpenSBI has some downsides you should be aware of:

On some machines, SBI is part of the platform firmware and might not be open source or user-replaceable.
M-mode does not use paging, so the emulation must do its own page table walk if the extension uses virtual addresses.
M-mode traps to SBI have extra overhead in hardware.
There are also security and operational concerns as M-mode has complete access to the hardware, but a bug in the operating system might be limited and recoverable (e.g., by a software watchdog).

Modifying Linux has the downside that the emulation is only available for Linux and won’t work for other operating systems, nor for the code that runs before Linux, such as SBL, SBI, u-boot, and EDK2.

RISC-V in research

RISC-V-based microarchitectures are an important part of all FPGA-based research projects at the Red Hat Collaboratory at Boston University, in part because of its support for custom extensions. Visit these project pages to learn more:

You can learn more about the role of RISC-V in research on open hardware in the RHRQ articles RISC-V in FPGAs: benefits and opportunities (RHRQ 4:1) and Fostering open innovation in hardware (RHRQ 2:2).

SHARE THIS ARTICLE

Feature

Translation layers for the cloud: speeding storage performance

Peter Desnoyers

A guide to understanding the hidden algorithms that manage the data in our everyday world, from smartphones to cloud apps. We look at which ones perform faster—and why.

Feature

Bridging clusters: a comparative look at multicluster networking performance in Kubernetes

Sai Sindhur Malleni

José Castillo Lema

André Bauer

Raúl Sevilla Canavate

The EU Horizon project CODECO aims to provide smoother and more flexible support of services for distributed workloads across the edge-cloud continuum. Here’s what researchers discovered about multicluster networking solutions. The shift towards microservices has redefined how modern applications are built and run. With this architectural style, developers can break down monolithic systems into smaller, […]

Feature

Changing the world, one lesson at a time

Matej Hrušovský

Why teaching more teachers is essential to computer science education.

Feature

Unikernel Linux (UKL) moves forward

Richard Jones

RHRQ first looked at the Unikernel Linux (UKL) project—a joint effort involving professors, PhD students, and engineers at the Boston University-based Red Hat Collaboratory—almost two years ago (RHRQ 3:3, November 2021). This previous article covered the background of unikernels in detail, but in brief: an application links directly to a specialized kernel, a lightly modified […]

Feature

Isn’t multi-tenancy Ironic?

Tzu-Mainn Chen

Lars Kellogg-Stedman

Virtualization is an amazing technology that has become a popular solution for sharing resources among members of an organization. However, some organizations need to harness the capabilities of an entire machine, without a layer of virtualization between the code and the hardware. Is it possible to share hardware between projects with the same ease as sharing virtual resources?

Feature

CRANE: teaching code models to think without breaking their tools

Mingzhi Zhu

Stacy Patterson

Michele Merler

Raju Pavuluri

Can we enhance AI reasoning without sacrificing the reliability of coding tools? The CRANE method proves it’s possible. A stronger reasoning model is not automatically a better coding agent. For many AI systems, the standard approach is to take a model that can reason longer, plan more carefully, and recover from mistakes, then place it […]

Feature

When machine learning meets big data processing: From human-native tasks to machine-native tasks

Ilya Kolchinsky

Since the inception of artificial intelligence research, computer scientists have aimed to devise machines that think and learn like human beings. What else could AI do?

Feature

GREEN.DAT.AI: an energy-efficient, AI-ready data space

Ben Capper

Data silos, regulatory compliance, and resource consumption limit the collaboration needed to address real-world challenges. A global consortium is working to change that. Significant challenges have hindered the rapid integration of artificial intelligence (AI) in key industries that drive economic and social development such as agriculture, finance, and energy. Shared data can provide substantial efficiency […]

Feature

Testing critical IoT systems to mitigate network disruptions

Miroslav Bureš

The Internet of Things brings new opportunities and new challenges for mission-critical applications where lives are at stake. Systematic testing can help. The Internet of Things (IoT) has significantly increased the capabilities of mission-critical systems in many domains. Integrated rescue systems, healthcare, defense, energy, and transportation benefit from using the IoT, enabling faster system reactions […]

Red Hat Research Quarterly

RISC-V extensions: what’s available and how to find them

Red Hat Research Quarterly

RISC-V extensions: what’s available and how to find them

Richard Jones

Related Projects

Red Hat Research Quarterly

November 2023

Extensions available in RISC-V enable the customizations that make it ideal as a basis for open innovation. Here’s the extension situation as it stands today.

Why extensions?

How extensions are encoded

Byte ordering of instructions

The classic extensions

Naming extensions

The curious case of compressed encoding

The new extensions

Vector, encryption, math

Virtualization

Interrupts, cache, and memory

Safety and security

Miscellaneous

Extension standardization and ratification

Profiles

Discovering extensions—Where’s my CPUID?

How QEMU implements extensions

Emulation of RISC-V extensions on RISC-V

RISC-V in research

Peter Desnoyers

Sai Sindhur Malleni

José Castillo Lema

André Bauer

Raúl Sevilla Canavate

Matej Hrušovský

Richard Jones

Tzu-Mainn Chen

Lars Kellogg-Stedman

Mingzhi Zhu

Stacy Patterson

Michele Merler

Raju Pavuluri

Ilya Kolchinsky

Ben Capper

Miroslav Bureš