How Genode came to RISC-V
This article supplements our recent announcement about Genode's port to the RISC-V hardware architecture with a look behind the scenes of the porting work. To learn more about our motivation to bring Genode to RISC-V, please refer to the announcement.
RISC-V is a completely open hardware architecture that is designed to scale from deeply embedded microcontrollers to high-performance general-purpose computing. It is based on a small integer instruction set architecture (ISA) that is available in 32 bit (RV32I) and 64 bit (RV64I) variants. It offers basic instructions like control transfer, integer manipulation, logical, bit shift, and system instructions. RV32I and RV64I are considered to be the only mandatory parts implemented within a RISC-V core whereby only one or both ISAs may be supported.
Additional functionality is provided by ISA extensions. Official extensions are for example the standard extension for integer multiplications and division (M), atomic instructions (A), single precision floating point (F), and double precision floating point (D). A core that implements the I ISA and the MAFD extensions is given the abbreviation G for general-purpose scalar instruction set.
Vendor-specific or special purpose ISA extensions are actively encouraged by the ISA design, thus widening RISC-V's field of application from small scale specialized hardware to general-purpose multi-core systems. One example of a very specialized small RISC-V processor is lowRISC's minion core concept where Rocket cores are given direct hardware I/O access. Minion cores are used to create software-defined I/O interfaces, perform data pre-processing, and may also be deployed to offload work from the main cores.
The RISC-V specification consists of two volumes, the user-level volume and privileged-architecture volume. Because the ISAs are not yet stable and may change over time, we initially implemented Genode against the user level ISA version 2.0 and the privileged architecture 1.7.
In the remainder of this article, we discuss the tools that are used to build and execute RISC-V binaries in Section Tools and debugging, our adaptations of Genode's custom kernel (base-hw) in Section The privileged architecture, and the work on Genode's user-level components in Section User-level support.
Tools and debugging
The Genode tool chain
Building the tool chain
Since RISC-V is a completely new architecture with its own instruction encoding, assembly language, and register set, an extension of existing compiler suites becomes necessary. UC Berkely provides extensions for both, the GNU compiler collection and a LLVM back end. Whereas the GCC version is feature complete, LLVM lacked support for the RV32 ISA when we started our work.
Genode does not officially support Clang/LLVM. So we picked the GNU tool chain for our RISC-V port. Luckily, the GCC version of the RISC-V project matched Genode's tool chain version at the time (4.9.2). Also the build process of the two projects is very similar: The RISC-V tool chain comes with a set of new files for the architecture and patches for binutils and GCC itself that enable the architecture. Therefore, we extended Genode's tool_chain script with the RISC-V architecture, added patches that create the necessary files and apply the changes required to GCC and binutils. This enabled us to successfully build the Genode tool chain for RISC-V.
Tool chain issues
From here the situation became kind of a "rough ride". As we resumed our work, we found that our version of the tool chain did not implement version 1.7 of the privileged ISA but some older architecture. A brief look at the current state of the project revealed that support for version 1.7 was added later but at the same time the GCC version was bumped twice from version 4.9.2 to 5.1.0 and then to 5.2.0. Upgrading to another version than 4.9.2 cannot be easily achieved on Genode, because other platforms, like ARM or x86 would also be required to move to the new compiler version. In our experience, tool chain upgrades normally become quite an effort. For this reason, we decided to manually revert the commits that upgraded the tool chain version in order to be able to use the 1.7 ISA but stay with our tool-chain version. With the RISC-V enabled Genode tool chain, we found the C++ exception support to be broken. Unfortunately, Genode heavily relies on C++ exceptions. We were able to locate the commit that broke exception support and filed a bug. Fortunately, the issue got resolved quickly.
Another issue related to C++ exceptions arises during exception-header frame creation. When a frame is created, all the FPU registers are stored, even if the code is compiled with the -msoft-float. FPU registers can only be saved and restored with the FPU enabled, otherwise a CPU fault is raised. Therefore, we have to enable the FPU, even though we currently do not support it.
Having the updated tool chain in place, we noticed that it contained some features, which where not conform to the specification. For example, the calling convention had somehow changed, also names and the meaning of some general-purpose registers had changed as well. So, we again contacted the RISC-V team and learned that both ISA versions had changed and that the changes where already incorporated into the respective tool chains but were still undocumented. We decided to go with these changes because most of them where of a cosmetic nature. Note that changes may be required in the future because it remains to be seen what ISA versions will be used by hardware projects like lowRISC.
Execution environments
Emulators
For testing our Genode scenarios, two emulator options are available. First, a RISC-V Qemu version and second an instruction emulator called Spike. At first, we decided to go with Qemu because it is already used for other Genode platforms. After some inspection, however, we found that Qemu also does not support the privileged ISA 1.7, and development seems to have significantly slowed. Spike seems to be the only emulator that is up to date because it is developed alongside the tool chain.
Therefore, we decided to take advantage of the instruction emulator and added Spike support to Genode's run environment.
FPGA
Once Genode was successfully running on Spike, support for the Xilinx Zynq FPGA was added. A ready-to-use bitstream for the Rocket core can be directly used on a suitable board (Zybo, Zedboard, or ZC706). As we will see in Section Debugging, the Rocket core infrastructure does not feature any devices yet. Therefore, using an FPGA currently does not give any advantage compared to using an emulator.
Debugging
For debugging purposes, RISC-V defines the so called Host-Target Interface (HTIF). This interface is used for serial output and general device access. The CPU exposes two special registers called mtohost and mfromhost. Commands can be written to the mtohost register and data can be received from the mfromhost register. Additionally, when data becomes available, a HTIF-interrupt is raised by the CPU. This interface is meant to be connected to a so called front-end server based on Linux, outside the RISC-V core. When a command is written to the mtohost register, the instruction emulator, which has the front-end server compiled into it, will translate the command into a Linux system call (e.g., a write to standard out). On the Zynq board, a Linux OS executing the front-end server is executed on the Zynq board's ARM CPU. So as in the case of the emulator, a write to the mtohost register is converted into a call to the front-end server and ultimately results in a Linux system call. Please note, the HTIF interface is not documented.
We where able to retrieve the HTIF character output command and implemented a serial output driver that leads to a mtohost-register write in Genode's base-hw kernel.
Spike features assembly-level debugging support where the machine state can be inspected, memory can be dumped, and even breakpoints can be set. This feature especially helped us debugging more complex scenarios.
The privileged architecture
As mentioned in the introduction, there exist several configuration options for the actual CPU implementation. Vendors are free to choose to implement whatever feature set suffices their needs and, therefore, we had to determine the feature set required by Genode. First of all, we decided to go with the 64-bit architecture (RV64I) because it is the most contemporary. What is actually required by Genode are the IMA extensions of RISC-V. Especially the atomic instruction (Section Atomic instructions) are important because Genode supports multithreading. We consider the floating point extensions (FD) also as valuable but postponed the implementation.
Another Genode requirement is virtual-memory support, either by offering an MMU or by hardware support of a software-loaded TLB. We will discuss this topic in Section Virtual memory management. In order to drive the kernel, a timer and an associated timer interrupt must be provided by the hardware (Section Timer).
Register sets
General purpose registers
RISC-V possesses thirty two general-purpose registers (x0 through x31), whereby x0 is hard-wired to the constant zero. If the floating-point unit (FPU) is enabled, there are additional thirty two floating point (f0 through f31) registers. The general-purpose registers must be saved and restored on each context switch, whereas the FPU registers only need to be saved and restored if the FPU has been used (is in dirty state). The FPU state can be retrieved from a control status register (Section Control status registers).
As common in RISC architectures, load and store operations are performed register indirect, by reading a source/target address from the given register with an optional offset. Care has to be taken when it comes to data alignment because all source/target addresses have to be naturally aligned. This means for example that when storing an 8-byte word using the sd instruction, the resulting target address has to be 8-byte-aligned. If not, an unaligned trap will be raised. Note, these restrictions are not found on x86 platforms and can be disabled on ARM. Therefore, we had to make sure that all of Genode's allocators return at least 8-byte-aligned addresses. Also, some statically constructed objects had to be aligned to an 8 byte boundary as well.
Control status registers
The machine state can be obtained and controlled through control status registers (CSRs). Each privilege mode (Section Privilege modes) has its own address range or set of registers assigned. While the lowest privilege level merely has access to counter and timer CSRs, the highest level has access to all CSRs including the ones of the lower levels.
Privilege modes
RISC-V features four privilege levels: Machine mode, hypervisor mode, supervisor mode, and user mode. Only machine mode is mandatory. Machine mode has access to all the hardware features but does not have virtual-memory support. Hypervisor mode is meant to be used for virtualization. As of the time of writing, the hypervisor mode ISA has not been specified. Supervisor mode is the level where an operating-system kernel is supposed to be executed. In contrast to the machine mode, this mode implements the MMU and offers a variety of page-table formats. Therefore, Genode's kernel will, after an initial boot-strapping phase, be executed in supervisor mode. User mode is - as usual - the place where user-level code is executed.
For some time, we wondered about the rationale behind the machine mode. After some research, we found that the upcoming privilege architecture 1.8 will describe a so called supervisor binary interface (SBI). SBI calls are system calls to machine mode. Since RISC-V is highly configurable, this interface is meant to hide underlying hardware specifics. One example given is a cache-flushing call, which might, depending on the cache hierarchy and type of caches, do something different or even nothing on diverse RISC-V systems. Therefore, the SBI interface has to be implemented only once per platform and in an ideal world, an operating system could run unmodified in supervisor mode on all these platforms. We implemented something similar in our serial output implementation (Section Debugging).
Because all traps and interrupts (Section Traps and interrupts) are always delivered to machine mode first, it could also be used as some sort of system-management mode by loading some vendor specific firmware-code during bootstrap, switch to supervisor mode, and thus lock non-vendor software out of machine mode.
Switching privilege modes
The figure above depicts the machine status register (mstatus). It is used to configure, which modes are actually executed by a system. mstatus can only be written in machine mode. PRV contains the current privilege level whereas the IE flag indicates whether interrupts are enabled for the current mode. The current mode can be changed by directly writing a different value to PRV. Another option is to execute the return from trap handler instruction (eret). eret shifts the values in PRV(x) to PRV(x - 1), which causes PRV1 to become the active mode. Note that IE of the previously active mode will always be enabled. If a trap is taken, the PRV values will be shifted to the left and interrupts for the mode that becomes active are disabled. Also the trap handler of the new mode will be executed.
In Genode, we currently want to omit the hypervisor mode. Therefore, during kernel initialization, we set PRV1 to user mode (with IE1='1') and PRV to supervisor mode (with IE='0').
Traps and interrupts
All modes but user mode must be able to handle traps. Therefore, a trap vector can be specified for each mode in the trap vector base CSR (mtvec for machine mode, stvec for supervisor mode, ...). Depending on the mode a trap occurs, different offsets from the vector base are in effect when calling the trap handler.
Address | Trap caused in |
---|---|
tvec | User mode |
tvec + 0x40 | Supervisor mode |
tvec + 0x80 | Hypervisor mode |
tvec + 0xC0 | Machine mode |
Table 1:
The trap reason can be obtained from the cause CSR (mcause/scause). Currently, there are twelve trap causes defined (e.g., alignment, fault, illegal instruction, and environmental call). If the most significant bit is set in the cause CSR, the trap is an interrupt and the lower bits encode the interrupt number. As of the time of writing, there are only three predefined interrupts defined: Software-, Timer-, and HTIF (Section Debugging) interrupt.
In its current incarnation, RISC-V lacks any specification of a programmable interrupt controller (PIC). There has been some discussion about this missing component and the current status is that early platforms will most likely be equipped with an off-the-shelf component. A PIC will, most likely, raise a fourth interrupt, which in turn can be serviced by the PIC driver. For this reason, the PIC implementation has been omitted in the current Genode port.
As stated in Section Privilege modes, all traps and interrupts are delivered to machine mode first. The machine mode then can decide to either service the trap or to forward it to another privilege mode via a trap redirection instruction, for example, by calling mrts to forward the trap to supervisor mode. mrts causes a mode switch and enters the corresponding offset from stvec. All important machine state CSRs (e.g., mcause or mip - the pc where the trap occurred) are then copied to the respective supervisor variants.
Because of the machine behavior described, the hw-kernel creates two exception tables, one for machine mode and one for supervisor mode. The machine mode exception handler checks if the trap requests a write to the mtohost register (Section Debugging). If so, the request is serviced and an environment return (eret) is produced. If the trap came from user mode, it is forwarded to supervisor mode via mrts. All other traps from machine or supervisor mode are treated as kernel errors and a core dump is performed. The supervisor trap handler implements OS common functionality like page fault and system-call handling.
Virtual memory management
Next to the execution in physical-memory mode, called Mbare environment, RISC-V offers virtual addressing with and without MMU support. The simplest option without MMU support is called "memory-base bound" environment (Mbb). Mbb uses a base register (mbase) and size register (mbound). Virtual addresses start at zero and are mapped to the physical address in mbase. The virtual address cannot be larger than mbound. RISC-V claims that this scheme is very cost effective to implement in hardware. There even exists a version where Mbb can be set for instructions and data independently. So code can be shared between address spaces.
MMU options support a range of page-table formats. There is a 32-bit format called Sv32. It features a two-level hierarchy and a 34-bit physical address, which provides 16 GiB of addressable physical memory. Sv39 is a three-level 64-bit page-table format with a 39-bit physical-address space (512 GiB) whereas Sv48 has four page-table levels (256 TB). Because of the varying levels of page tables, there is a trade-off between addressable virtual memory and the page table memory required per address space. Again, which option to implement in hardware, if any, is vendor-specific.
Because we picked the RV64 architecture for our initial Genode port, we decided to implement the Sv39 format. As the format uses 38 bit physical page numbers, this leads to a gap in the virtual address space (Figure 2) where the bits 39-63 must be equal to bit 38.
The figure above shows a page-table entry, which is 64 bits wide. Since a page table is page-sized (4 KiB), 512 entries fit into one page table. The type field not only encodes read, write, and execute permissions. It also defines if the entry points to a higher-level page table or an actual mapping. This way, Sv39 supports 2-MiB mappings for level 2 and 1-GiB mappings for level-1 entries.
Page tables of all levels have the exact same format. So we were able to implement our kernel's page-table handling code in roughly 300 lines of C++. Caution, unlike other page-table formats, the physical page numbers start at bit 10 (not 12). If for example physical page 0x1000 is the target of the mapping, 0x400 has to be inserted.
A design decision we find questionable is RISC-V's address space ID (ASID) support. Here, the same mistake that caused trouble on ARM is made. There is a separate register for the page-table-base pointer (sptbr) and the address-space-id register (sasid). This means address space IDs and page tables cannot be switched atomically, leaving the question what happens when the address space ID is switched and then the page-table switching instruction is fetched next. On ARM, this raised some strange TLB issues, with even weirder solutions (interim page tables with global mappings) and in newer revisions, the address space ID became encoded in the lower 12 bits of the page-table base.
Cacheability attributes
When looking at the page-table-entry format (Figure 3), one notices that there are no cacheability attributes defined. Architectures like ARM and x86 define these within their respective page tables and they are used by the OS for things like memory-mapped I/O (MMIO) or DMA memory that must be accessed non-cacheable. So hardware devices and the CPU "see" the same content.
RISC-V chooses a different approach by partitioning the physical memory into different regions. There will be regions that are wired as uncached memory whereas others represent cached memory. Caching is not supposed to be enabled or disabled at run time. While this approach solves the issue for MMIO, there is no solution for DMA memory, as of now.
Kernel entry and kernel exit
In base-hw, kernel entries and exits lead to a context switch to/from core's address page. On kernel entries, the registers of the current user context are stored in the kernel and restored upon a kernel exit before switching to the context's page tables. RISC-V offers a scratch CSR for each privileged mode. We use sscratch to store the current context pointer, so we can find it quickly upon kernel entry.
Closely related to the kernel entry/exit, we found it hard to implement position-independent code (PIC) with the current tool chain. For example, an operation like:
addi t0, t0, 2b - 1b
where we want to add the distance of label 2 from label 1 to t0 is currently not possible. Unfortunately, the kernel entry code is a page at the very end of the virtual address space and is not identity mapped. Therefore, it must contain PIC code only. By using the sscratch CSR as described above, we were able to avoid this issue altogether.
Timer
Implementing the timer should be straight forward, or so we thought. There is a time CSR for each mode (stime for supervisor) and a timecmp CSR also defined for each mode. The OS then programs stimecmp to the next timeout. When stime reaches the value of stimecmp, a timer interrupt is triggered.
After implementing this simple timer driver, we found that stimecmp had disappeared from the tool chain and the instruction emulator. We asked the RISC-V developers about this issue and found out that only the mtimecmp register will remain in future versions of the privileged specification. There will be an SBI call (Section Privilege modes) to program the machine timer. When the timer interrupt occurs, machine mode will have to handle the interrupt forwarding to supervisor mode. Note, machine mode must also handle the case when interrupts are currently disabled in supervisor mode. The reason for the removal is motivated by hardware costs.
We first implemented the stimecmp solution, but in the current Genode version it was replaced by the second approach. This means, Genode's RISC-V support is somewhere between privileged ISA 1.7 and 1.8. Once 1.8 is out, we will have to re-evaluate our current solution, especially regarding the SBI interface.
Atomic instructions
On Genode, the only prerequisite to implement locks and all other synchronization primitives is an atomic compare-exchange (cmpxchg) operation. RISC-V's atomic instruction extension offers the traditional load reserved (lr) and store conditional (sc) instructions, which are convenient to implement cmpxchg.
User-level support
To support basic user level programs on Genode, only a few things are required. We had to implement the crt0.s assembly startup code, which loads a preliminary stack, calls Genode's component-local initialization code, and ultimately the _main function. Also, we had to implement the kernel's system call bindings for RISC-V. These bindings fill registers with system call arguments as expected by the kernel and issue the ecall instruction, which leads to a environmental call trap in the kernel. Having implemented these parts, we were able to execute Genode's run/printf scenario successfully.
Support for dynamic linking
In the next step, we enabled Genode's dynamic linker on the RISC-V platform. The linker is an executable shared library, which is linked with Genode's genode_rel.ld linker script. It contains all Genode base libraries and executes the same startup code as statically linked Genode components. As stated in Section User-level support, the first step a Genode component takes is to load a preliminary stack pointer. Since the stack symbol is global, it is located somewhere in the global offset table (GOT). In order to load the symbol, we used the load-address instruction (la), la is a tool-chain mnemonic that is translated into two RISC-V instructions, which load the value of the symbol from the GOT. Unfortunately, the GOTs of shared libraries are empty on RISC-V, resulting in a zero stack pointer in the linker. After intensive brain storming, we came up with a solution where we would obtain the current program counter and load the offset of the stack symbol program counter relative from the text segment. As soon as we got this working, the _DYNAMIC symbol (it points to the beginning of the dynamic header of an ELF file) turned out to be also zero because it was also read from the global offset table. After some more investigation, we found out that there exists a mnemonic called lla (load local address - we assume) in the tool chain. This version does not involve the GOT but returns the position of the symbol within a binary. Note, both of these mnemonics are currently not documented.
Having a valid stack pointer and a pointer to the dynamic header, we implemented RISC-V-specific relocations next. This was a straight forward experience, with the only exception that all jump slot relocations (these are used to relocate function calls) were present in the section for global data. Usually, the procedure linkage table contains code that pushes the symbol number on the stack and calls the third entry of the table where the dynamic linker has installed its entry address. This is how lazy binding is implemented. On RISC-V, there is no trace of lazy binding to be found anywhere and all function relocations have to be performed eagerly at load time.
With relocation support in place, we were able to successfully run Genode's dynamic linker test (run/ldso), which covers corner cases like cross library function calls, cross library C++ exceptions, global constructors, and even dynamic casts (these require runtime type information).
Summary and current state
Thanks to the steps described herein, Genode system scenarios can be executed on the RISC-V architecture. We extended base-hw and implemented the kernel part against RISC-V privileged specification 1.7 and user level ISA 2.0. During this task, we found out that changes to both specifications had already been made, which made it clear that both specifications are under active development. Some research showed that the user-level ISA has been frozen by the end of 2015. As for the privileged specification, there will be an updated version (1.8), which also shall be frozen in 2016. In Genode's userland, we enabled the fundamental services (core and init) and ported Genode's dynamic linker to RISC-V.
Because of the current lack of devices, for example no user-level timer, no input devices, and no framebuffer, we stopped short on building more sophisticated scenarios while also postponing libc support because it would not bring a tangible benefit at this stage.
RISC-V remains a very interesting project but we are cautious until real hardware becomes available, which is supposed to happen in 2016 through the lowRISC project. There are still unresolved issues like how DMA memory is handled and the provisioning of an interrupt controller, if any at all. The highly modularized approach of the RISC-V design allows for a wide range of appliances. So it remains to be seen what vendor will support which features of the architecture and how open real hardware will be (Section Privilege modes).
Most of the work presented in the article was conducted by Genode Labs in Summer 2015. The adaptation to recent ISA versions as well as the FPGA support was kindly contributed by NL Cyber Security Labs. With Genode version 16.02, the outcome has been incorporated into the mainline of the Genode OS Framework. For trying out Genode on RISC-V, please follow the instructions given in the release documentation.