CVA6: An application class RISC-V CPU core
The goal of the CVA6 project is create a family of production quality, open source, application class RISC-V CPU cores. The CVA6 targets both ASIC and FPGA implementations, although individual cores may target a specific implementation technology. The CVA6 is written in SystemVerilog and is heavily parameterizable. For example parameters can set the ILEN to be either 32- or 64-bits and support for floating point can be enabled/disabled.
CORE-V Nomenclature
CORE-V is the name of the OpenHW Group family of RISC-V cores. CVA6 is the name of a GitHub repository for the source code for a set of application class CORE-V cores. The CV prefix identifies it as a member of the CORE-V family and the A6 indicates that it is an application class processor with a six stage execution pipeline. However, the CVA6 “as is” is not intended to implement a specific production core. Rather, the CVA6 is expected to be the basis for a number of application class cores. The naming convention for these cores is:
CV <ILEN> <class> <# of pipeline stages> <product identifier>
Thus, the CV64A60 would be a 64-bit application core with a six stage pipeline. Note that in this example, the product identifer is “0”.
Organization of this Document
This documentation is split into multiple parts.
The CVA6 User Guide provides a detailed introduction to the CVA6. This document is based on the original Ariane documentation and is aimed at hardware developers integrating CVA6 into a design.
The CVA6 Requirements Specification is the top-level specification of the CVA6. One of the key attributes of this document is to specify the feature set of specific CORE-V products based on CVA6. This document focuses on _what_ the CVA6 does, without detailed consideration of _how_ a specific requirement is implemented. The target audience of this document is current and existing members of the OpenHW Group who wish to participate in the definition of future cores based on the CVA6.
The CV32A6 Design Specification describes in detail the CV32A6, the first production quality 32-bit application processor derived from the CVA6. The primary audience for this documentation are design and verification engineers working to bring the CV32A6 to TRL-5.
OpenHW Group CVA6 User Manual
Editor: Florian Zaruba florian@openhwgroup.org
Introduction
This document describes the 6-stage, single issue Ariane CPU which implements the 64-bit RISC-V instruction set. It fully implements I, M and C extensions as specified in Volume I: User-Level ISA V 2.1 as well as the draft privilege extension 1.10. It implements three privilege levels M, S, U to fully support a Unix-like operating system.
Scope and Purpose
The purpose of the core is to run a full OS at reasonable speed and IPC. To achieve the necessary speed the core features a 6-stage pipelined design. In order to increase the IPC the CPU features a scoreboard which should hide latency to the data RAM (cache) by issuing data-independent instructions. The instruction RAM has (or L1 instruction cache) an access latency of 1 cycle on a hit, while accesses to the data RAM (or L1 data cache) have a longer latency of 3 cycles on a hit.

PC Generation
PC gen is responsible for generating the next program counter. All
program counters are logical addressed. If the logical to physical
mapping changes a fence.vm
instruction should flush the pipeline and TLBs.
This stage contains speculation on the branch target address as well as the information if the branch is taken or not. In addition, it houses the branch target buffer (BTB) and a branch history table (BHT).
If the BTB decodes a certain PC as a jump the BHT decides if the branch
is taken or not. Because of the various state-full memory components
this stage is split into two pipeline stages. PC Gen communicates with
the IF via a handshake signal. Instruction fetch signals its readiness
with an asserted ready signal while PC Gen signals a valid request by
asserting the fetch_valid
signal.
The next PC can originate from the following sources (listed in order of precedence):
Default assignment: The default assignment is to fetch PC + 4. PC Gen always fetches on a word boundary (32-bit). Compressed instructions are handled in a later pipeline step.
Branch Predict: If the BHT and BTB predict a branch on a certain PC, PC Gen sets the next PC to the predicted address and also informs the IF stage that it performed a prediction on the PC. This is needed in various places further down the pipeline (for example to correct prediction). Branch information which is passed down the pipeline is encapsulated in a structure called
branchpredict_sbe_t
. In contrast to branch prediction information which is passed up the pipeline which is just calledbp_resolve_t
. This is used for corrective actions (see next bullet point). This naming convention should make it easy to detect the flow of branch information in the source code.Control flow change request: A control flow change request occurs from the fact that the branch predictor mis-predicted. This can either be a ‘real’ mis-prediction or a branch which was not recognized as one. In any case we need to correct our action and start fetching from the correct address.
Return from environment call: A return from an environment call performs corrective action of the PC in terms of setting the successive PC to the one stored in the
[m|s]epc
register.Exception/Interrupt: If an exception (or interrupt, which is in the context of RISC-V systems quite similar) occurs PC Gen will generate the next PC as part of the trap vector base address. The trap vector base address can be different depending on whether the exception traps to S-Mode or M-Mode (user mode exceptions are currently not supported). It is the purpose of the CSR Unit to figure out where to trap to and present the correct address to PC Gen.
Pipeline Flush because of CSR side effects: When a CSR with side-effects gets written we need to flush the whole pipeline and start fetching from the next instruction again in order to take the up-dated information into account (for example virtual memory base pointer changes).
Debug: Debug has the highest order of precedence as it can interrupt any control flow requests. It also the only source of control flow change which can actually happen simultaneously to any other of the forced control flow changes. The debug unit reports the request to change the PC and the PC which the CPU should change to.
This unit also takes care of a signal called fetch_enable
which
purpose is to prevent fetching if not asserted. Also note that no
flushing takes place in this unit. All the flush information is
distributed by the controller. Actually the controller’s only purpose is
to flush different pipeline stages.
Branch Prediction
Ariane Block Diagram
All branch prediction data structures reside in a single register-file like data structure. It is indexed with the appropriate number of bits from the PC and contains information about the predicted target address as well as the outcome of a configurable-width saturation counter (two by default). The prediction result is used in the subsequent stage to jump (or not).
In addition of providing prediction result the BTB also updates its information on mis-predictions. It can either correct the saturation counter or clear the branch prediction entry. The latter is done when the branch unit saw that the predicted PC didn’t match or an when an instruction with privilege changing side-effect is committing.
The branch-outcome and the branch target address are calculated in the same functional unit therefore a mis-prediction on the target address is as costly as a mis-prediction on the branch decision. As the branch unit (the functional unit which does all the branch-handling) is already quite critical in terms of timing this is a potential improvement.
As Ariane fully implements the compressed instruction set branches can also happen on 16-bit (or half word) instructions. As this would significantly increase the size of the BTB the BTB is indexed with a word aligned PC. This brings the potential draw-back that branch-prediction does always mis-predict on a instruction fetch word which contains two compressed branches. However, such case should be rare in reality.
A trick we played here is to take the next PC (e.g.: the word aligned PC of the upper 16-bit of this instruction) of an un-aligned instruction to index the BTB. This naturally allows the the IF stage to fetch all necessary instruction data. Actually it will fetch two more unused bytes which are then discarded by the instruction re-aligner. For that reason we also need to keep an additional bit whether the instruction is on the lower or upper 16-bit.
For branch prediction a potential source of unnecessary pipeline bubbles is aliasing. To prevent aliasing from happening (or at least make it more unlikely) a couple of tag bits (upper bits from the indexed PC) are used and compared on every access. This is a trade-off necessary as we are lacking sufficiently fast SRAMs which could be used to host the BTB. Instead we are forced to use register which have a significantly larger impact on over all area and power consumption.
Instruction Fetch Stage
Instruction Fetch stage (IF) gets its information from the PC Gen stage. This information includes information about branch prediction (was it a predicted branch? which is the target address? was it predicted to be taken?), the current PC (word-aligned if it was a consecutive fetch) and whether this request is valid. The IF stage asks the MMU to do address translation on the requested PC and controls the I$ (or just an instruction memory) interface. The instruction memory interface is described in more detail in .
The delicate part of the instruction fetch is that it is very timing critical. This fact prevents us from implementing some more elaborate handshake protocol (as round-times would be too large). Therefore the IF stage signals the I$ interface that it wants to do a fetch request to memory. Depending on the cache’s state this request may be granted or not. If it was granted the instruction fetch stage puts the request in an internal FIFO. It needs to do so as it has to know at any point in time how many transactions are outstanding. This is mostly due to the fact that instruction fetch happens on a very speculative basis because of branch prediction. It can always be the case that the controller decides to flush the instruction fetch stage in which case it needs to discard all outstanding transactions.
The current implementation allows for a maximum of two outstanding transaction. If there are more than two the IF stage will simply not acknowledge any new request from PC Gen. As soon as a valid answer from memory returns (and the request is not considered out-dated because of a flush) the answer is put into a FIFO together with the fetch address and the branch prediction information.
Together with the answer from memory the MMU will also signal potential exceptions. Therefore this is the first place where exceptions can potentially happen (bus errors, invalid accesses and instruction page faults).
Fetch FIFO
The fetch FIFO contains all requested (valid) fetches from instruction memory. The FIFO currently has one write port and two read ports (of which only one is used). In a future implementation the second read port could potentially be used to implement macro-op fusion or widen the issue interface to cover two instructions.
The fetch FIFO also fully decouples the processor’s front-end and its back-end. On a flush request the whole fetch FIFO is reset.
Instruction Decode
Instruction decode is the fist pipeline stage of the processor’s back-end. Its main purpose is to distill instructions from the data stream it gets from IF stage, decode them and send them to the issue stage.
With the introduction of compressed instructions (in general variable length instructions) the ID stage gets a little bit more complicated: It has to search the incoming data stream for potential instructions, re-align them and (in the case of compressed instructions) decompress them. Furthermore, as we will know at the end of this stage whether the decoded instruction is branch instruction it passes this information on to the issue stage.
Instruction Re-aligner
Instruction re-alignment Process
As mentioned above the instruction re-aligner checks the incoming data stream for compressed instructions. Compressed instruction have their last bit unequal to 11 while normal 32-bit instructions have their last two bit set to 11. The main complication arises from the fact that a compressed instruction can make a normal instruction unaligned (e.g.: the instruction starts at a half word boundary). This can (in the worst case) mandate two memory accesses before the instruction can be fully decoded. We therefore need to make sure that the fetch FIFO has enough space to keep the second part of the instruction. Therefore the instruction re-aligner needs to keep track of whether the previous instruction was unaligned or compressed to correctly decide what to do with the upcoming instruction.
Furthermore, the branch-prediction information is used to only output the correct instruction to the issue stage. As we only predict on word-aligned PCs the passed on branch prediction information needs to be investigated to rule out which instruction we are actually need, in case there are two instructions (compressed or unaligned) present. This means that we potentially have to discard one of the two instructions (the instruction before the branch target). For that reason the instruction re-aligner also needs to check whether this fetch entry contains a valid and taken branch. Depending on whether it is predicted on the upper 16 bit it has to discard the lower 16 bit accordingly. This process is illustrate in .
Compressed Decoder
As mentioned earlier we also need to decompress all the compressed instructions. This is done by a small combinatorial circuit which takes a 16-bit compressed instruction and expands it to its 32-bit equivalent. All compressed instructions have a 32-bit equivalent.
Decoder
The decoder either takes the raw instruction data or the uncompressed equivalent of the 16-bit instruction and decodes them accordingly. It transforms the raw bits to the most fundamental control structure in Ariane, a scoreboard entry:
PC: PC of instruction
FU: functional unit to use
OP: operation to perform in each functional unit
RS1: register source address 1
RS2: register source address 2
RD: register destination address
Result: for unfinished instructions this field also holds the immediate
Valid: is the result valid
Use I Immediate: should we use the immediate as operand b?
Use Z Immediate: use zimm as operand a
Use PC: set if we need to use the PC as operand a, PC from exception
Exception: exception has occurred
Branch predict: branch predict scoreboard data structure
Is compressed: signals a compressed instructions, we need this information at the commit stage if we want jump accordingly e.g.:
+4
,+2
It gets incrementally processed further down the pipeline. The
scoreboard entry controls operand selection, dispatch and the execution.
Furthermore it contains an exception entry which strongly ties the
particular instruction to its potential exception. As the first time an
exception could have occoured was already in the IF stage the decoder
also makes sure that this exception finds its way into the scoreboard
entry. A potential illegal instruction exception can occur during
decoding. If this is the case and no previous exception has happened the
decoder will set the corresponding exceptions field along with the
faulting bits (in [s|m]tval
). As this is not the only point in which
illegal instruction exception can happen and an illegal instruction
exception always asks for the faulting address in the [s|m]tval
field
this field gets set here anyway. But only if instruction fetch didn’t
throw an exception for this instruction yet.
Issue Stage
The issue stage’s purpose is to receive the decoded instructions and issue them to the various functional units. Furthermore the issue stage keeps track of all issued instructions, the functional unit status and receives the write-back data from the execute stage. Furthermore it contains the CPU’s register file. By using a data-structure called scoreboard (see ) it knows exactly which instructions are issued, which functional unit they are in and which register they will write-back to. As previously mentioned you can roughly divide the execution in four parts 1. issue, 2. read operands, 3. execute and 4. write-back. The issue stage handles step one, two and four.
Ariane Scoreboard
Issue
When the issue stage gets a new decoded instruction it checks whether the required functional unit is free or will be free in the next cycle. Then it checks if its source operands are available and if no other, currently issued, instruction will write the same destination register. Furthermore it keeps track that no unresolved branch gets issued. The latter is mainly needed to simplify hardware design. By only allowing one branch we can easily back-track if we later find-out that we’ve mis-predicted on it.
By ensuring that the scoreboard only allows one instruction to write a
certain destination register it easies the design of the forwarding path
significantly. The scoreboard has a combinatorial circuit which outputs
the status of all 32 destination register together with what functional
unit will produce the outcome. This signal is called rd_clobber
.
The issue stage communicates with the various functional units independently. This in particular means that it has to monitor their ready and valid signals, receive and store their write-back data unconditionally. It will always have enough space as it allocates a slot in the scoreboard for every issued instruction. This solves the potential structural hazards of smaller microprocessors. This modular design will also allow to explore more advanced issuing technique like out-of-order issue ().
The issuing of instructions happen in-order, that means order of program flow is naturally maintained. What can happen out-of-order is the write-back of each functional unit. Think for example, that the issue stage issues a multiplication which takes $n$ clock cycles to produce a valid result. In the next cycle the issue stage issues an ALU instruction like an addition. The addition will just take one clock cycle to return and therefore return before the multiplication’s result is ready. Because of this we need to assign IDs to the various issue stages. The ID resembles the (unique) position in which the scoreboard will store the result of this instruction. The ID (called transaction ID) has enough bits to uniquely represent each slot in the scoreboard and needs to be passed along with the other data to the corresponding functional unit.
This scheme allows the functional units to operate in complete independence of the issue logic. They can return different transactions in different order. The scoreboard will know where to put them as long as the corresponding ID is signaled alongside the result. This scheme even allows the functional unit to buffer results and process them entirely out-of-order if it makes sense to them. This is a further example of how to efficiently decouple the different modules of a processor.
Read Operands
Read operands is physically happens in the same cycle as the issuing of
instructions but can be conceptually thought of as another stage. As the
scoreboard knows which registers are getting written it can handle the
forwarding of those operands if necessary. The design goal was to
execute two ALU instructions back to back (e.g.: with no bubble in
between). The operands come from either the register file (if no other
instruction currently in the scoreboard will write that register) or be
forwarded by the scoreboard (by looking at the rd_clobber
signal).
The operand selection logic is a classical priority selection giving precedence to results form the scoreboard over the register file as the functional unit will always produce the more up to date result. To obtain the right register value we need to poll the scoreboard for both source operands.
Scoreboard
The scoreboard is implemented as a FIFO with one read and one write port with valid and acknowledge signals. In addition to that it provides the aforementioned signals which tell the rest of the CPU which registers are going to be clobbered by a previously scheduled instruction. Instruction decode directly writes to the scoreboard if it is not already full. The commit stage looks for already finished instructions and updates the architectural state. Which either means going for an exception, updating the register or CSR file.
Execute Stage
The execute stage is a logical stage which encapsulates all the functional units (FUs). The FUs are not supposed to have inter-unit dependencies for the moment, e.g.: every FU must be able to perform its operation independently of every other unit. Each functional unit maintains a valid signal with which it will signal valid output data and a ready signal which tells the issue logic whether it is able to accept a new request or not. Furthermore, as briefly explained in the section about instruction issue (), they also receive a unique transaction ID. The functional unit is supposed to return this transaction ID together with the valid signal an the result. At the time of this writing the execute stage houses an ALU, a branch unit, a load store unit (LSU), a CSR buffer and a multiply/divide unit.
ALU
The arithmetic logic unit (ALU) is a small piece of hardware which performs 32 and 64-bit subtraction, addition, shifts and comparisons. It always completes its operation in a single cycle and therefore does not contain any state-full elements. Its ready signal is always asserted and it simply passes the transaction ID from its input to its output. Together with the two operands it also receives an operator which tells it which operation to perform.
Branch Unit
The branch unit’s purpose is to manage all kind of control flow changes
i.e.: conditional and unconditional jumps. It does so by providing an
adder to calculate the target address and some comparison logic to
decide whether to take the branch or not. Furthermore it also decides if
a branch was mis-predicted or not and reporting corrective actions to
the PC Gen stage. Corrective actions include updating the BHT and
setting the PC if necessary. As it can be that jumps are predicted on
any instruction (including instructions which are no jumps at all - see
aliasing problem in PC Gen section) it needs to know whenever an instruction gets
issued to a functional unit and monitor the branch prediction
information. If a branch was accidentally predicted on a non-branch
instruction it also takes corrective action and re-sets the PC to the
correct address (depending on whether the instruction was compressed or
not it add PC + 2
or PC + 4
).
As briefly mentioned in the section about instruction re-aligning the branch unit places the PC from an unaligned 32-bit instruction on the upper 16-bit (e.g.: on a new word boundary). Moreover if an instruction is compressed it also has an influence on the reported prediction as it needs to set a bit if the prediction occurred on the lower 16 bit (e.g.: the lower compressed instruction).
As can be seen this all adds a lot of costly operations to this stage, mostly comparison and additions. Therefore the branch unit is on the critical path of the overall design. Nevertheless, it was our design-choice to keep branches a single cycle operation. Still, it could be the case that in a future version it might make sense to split this path. This would bring some costly IPC implications to the overall design mainly because of the current restriction that the scoreboard is only admitting new instructions if there are no unresolved branches. With a single cycle operation all branches are resolved in the same cycle of issue which doesn’t introduce any pipeline stalls.
Load Store Unit (LSU)
Load/Store Unit
The load store unit is similar to every other functional unit. In addition, it has to manage the interface to the data memory (D$). In particular, it houses the DTLB (Data Translation Lookaside Buffer), the hardware page table walker (PTW) and the memory management unit (MMU). It also arbitrates the access to data memory between loads, stores and the PTW - giving precedence to PTW lookups. This is done in order to resolve TLB misses as soon as possible. A high level block diagram of the LSU can be found in .
The LSU can issue load request immediately while stores need to be kept back as long as the scoreboard does not issue a commit signal: This is done because the whole processor is designed to only have a single commit point (see ). Because issuing loads to the memory hierarchy does not have any semantic side effects the LSU can issue them immediately, totally in contrast to the nature of a store. Stores alter the architectural state and are therefore placed in a store buffer only to be committed in a later step by the commit stage. Sometimes this is also called posted-store because the store request is posted to the store queue and waiting for entering the memory hierarchy as soon as the commit signal goes high and the memory interface is not in use.
Therefore, upon a load, the LSU also needs to check the store buffer for potential aliasing. Should it find uncommitted data it stalls, since it can’t satisfy the current request.
This means:
Two loads to the same address are allowed. They will return in issue order.
Two stores to the same address are allowed. They are issued in-order by the scoreboard and stored in-order in the store buffer as long as the scoreboard didn’t give the signal to commit them.
A store followed by a load to the same address can only be satisfied if the store has already been committed (marked as committed in the store buffer). Otherwise the LSU stalls until the scoreboard commits the instruction. We cannot guarantee that the store will eventually be committed (e.g.: an exception occurred).
For the moment being, the LSU does not handle misaligned accesses. In particular this means that access which are not aligned to a 64 bit boundary for double word accesses, access which are not aligned to a 32-bit boundary for word access and the accesses which are not aligned on 16-bit boundary for half word access. If encounters such a load or store it will throw a misaligned exception and lets the exception handler resolve the load or store. In addition to mis-aligned exceptions it can also throw page fault exceptions.
To ease the design of the LSU it is split in 6 major parts of which each is described in more detail in the upcoming paragraphs:
LSU Bypass
D$ Arbiter
Load Unit
Store Unit
MMU (including TLBs and PTW)
Non-blocking data cache
LSU Bypass {#par:lsu_bypass}
The LSU bypass module is a auxiliary module which manages the LSU status information (full flag etc.) which it presents to the issue stage. This is necessary for a the following reason: The design of the LSU is critical in most aspects as it directly interfaces the relatively slow SRAMs. It additionally needs to do some costly operation in sequence. The most costly (in terms of timing) being address generation, address translation and checking the store buffer for potential aliasing. Therefore it is only known very late whether the current load/store can go to memory or if additional cycles are needed. From which aliasing on the store buffer and TLB miss are the most prominent ones. As the issue stage relies on the ready signal to dispatch new instructions this would result in an overly long path which would considerably slow down the whole design because of some corner cases.
To mitigate this problem a FIFO is added which can hold another request from issue stage. Therefore the ready flag of the functional units can be delayed by one cycle which eases timing. The LSU bypass model further decouples the functional unit from the issue stage. This is mostly necessary as the issue stage can’t stall as soon as it issued an instruction. In particular the LSU bypass is called that way because it is either bypassed or serves the load or store unit from its internal FIFO until they signal completion to the LSU bypass module.
Load Unit {#par:load_unit}
The load unit takes care of all loads. Loads are issued as soon as possible as they do not have any side effects. Before issuing a load the load unit needs to check the store buffer for stores which are not committed into the memory hierarchy yet in order to avoid loading stale data. As a full comparison is quite costly only the lower 12 bit (the page-offset where physical and virtual addresses are the same) are compared. This has two major advantages: the comparison is only 12-bit instead of 64-bit and therefore faster when done on the whole buffer and the physical address is not needed which implies that we don’t need to wait for address translation to finish. If the page offset matches with one of the outstanding stores the load unit simply stalls and waits until the store buffer is drained. As an improvement one could do some more elaborate data forwarding as the data in the store buffer is the most up-to-date. This is not done at the moment.
Furthermore the load unit needs to perform address translation. It makes use of virtually indexed and physically tagged D$ access scheme in order to reduce the number of cycles needed for load accesses. As it can happen that a load blocks the D$ it has to kill the current request on the memory interface to give way to the hardware PTW on the cache side. Some more advanced caching infrastructure (like a non-blocking cache) would alleviate this problem.
Store Unit {#par:store_unit}
The store unit manages all stores. It does so by calculating the target address and setting the appropriate byte enable bits. Furthermore it also performs address translation and communicates with the load unit to see if any load matches an outstanding store in one of its buffers. Most of the store units business logic resides in the store buffer which is described in detail in the next section.
Store Buffer {#par:store_buffer}
The store buffer keeps track of all stores. It actually consists of two
buffers: One is for already committed instructions and one is for
outstanding instructions which are still speculative. On a flush only
the instruction which are already committed are persisted while the
speculative queue is completely emptied. To prevent buffer overflows the
two queues maintain a full flag. The full flag of the speculative queue
directly goes to the store unit, which will stall the LSU bypass module
and therefore not receive any more requests. On the contrast the full
signal of the commit queue goes to the commit stage. Commit stage will
stall if it the commit queue can’t accept any new data items. On every
committed store the commit stage also asserts the lsu_commit
signal
which will put the particular entry from the speculative queue into the
non-sepculative (commit) queue.
As soon as a store is in the commit queue the queue will automatically try to commit the oldest store in the queue to memory as soon as the cache grants the request.
The store buffer only works with physical addresses. At the time when they are committed the translation is already correct. For stores in the speculative queue addresses are potentially not correct but this fact will resolve if address translation data structures are updated as those instructions will also automatically flush the whole speculative buffer.
Memory Management Unit (MMU) {#par:mmu}
Memory Management Unit
The memory management unit (MMU) takes care of address translation (see ) and memory accesses in general. Address translation needs to be separately activated by writing the corresponding control and status register and switching to a lower privilege mode than machine mode. As soon as address translation is enabled it will also handle page faults. The MMU contains an ITLB, DTLB and hardware page table walker (HPTW). Although logically not really entangled - the fetch interface is also routed through the MMU. In general the fetch and data interface are handled differently. They only share the HPTW with each other (see .
There are mainly two fundamentally different paths through the MMU: one from the instruction fetch stage and the other from the LSU. Lets begin with the instruction fetch interface: The IF stage makes a request to get the memory content at a specific address. Instruction fetch will always ask for virtual addresses. Depending on whether the address translation is enabled the MMU will either transparently let the request directly go to the I$ or do address translation.
In case address translation is activated, the request to the instruction cache is delayed until a valid translation can be found. If no valid translation can be found the MMU will signal this with an exception. Furthermore, if an address translation can be performed with a hit on the ITLB it is a purely combinational path. The TLB is implemented as a fully set-associative caches made out of flops. This in turn means that the request path to memory is quite long and may become critical quite easily.
If an exception occurred the exception is returned to the instruction fetch stage together with the valid signal and not the grant signal. This has the implication that we need to support multiple out-standing transactions on the exception path as well (see ). The MMU has a dedicated buffer (FIFO) which stores those exceptions and returns them as soon as the answer is valid.
The MMUs interface on the data memory side (D$) is entirely different. It has a simple request-response interfaces guarded by handshaking signals. Either the load unit or the store unit will ask the MMU to perform address translation. However the address translation process is not combinatorial as it is the case for the fetch interface. An additional bank of registers delays the MMU’s answer (on a TLB hit) an additional cycle. As already mentioned in the previous paragraph address translation is a quite critical process in terms of timing. The particular problem on the data interface is the fact that the LSU needs to generate the address beforehand. Address generation involves another costly addition. Together with address translation this path definitely becomes critical. As the data cache is virtually indexed and physical tagged this additional cycle does not cost any loss in IPC. But, it makes the process of memory requests a little bit more complicated as we might need to abort memory accesses because of exceptions. If an exception occurred on a load request the load unit needs to kill the memory request it sent the cycle earlier. An excepting load (or store) will never go to memory.
Both TLBs are fully set-associative and configurable in size. Also the application specifier ID (ASID) can be changed in size. The ASID can prevent flushing of certain regions in the TLB (for example when switching applications). This is currently not implemented.
Page Table Walker (PTW)
The purpose of a page table walker has already been introduced in . The page table walker listens on both ITLB and DTLB for incoming translation requests. If it sees that either one of the requests is missing on the TLB it saves the virtual address and starts its page table walk. If the page table walker encounters any error state it will throw a page fault exception which in return is caught by the MMU and propagated to either the fetch interface or the LSU.
The page table walker gives precedence to DTLB misses. The page table walking process is described in more detail in the RISC-V Privileged Architecture.
PMA/PMP Checks
The core supports PMA and PMP checks in physical mode as well as with virtual memory enabled. PMA checks are performed only on the final access to the (translated) physical address. However, PMPs must be checked during the page table walk as well. During a page walk, all memory access must pass the PMP rules.
The amount of entries is parametrizable under the
ArianeCfg.NrPMPEntries
parameter. However, the core only supports
granularity 8 (G=8). This simplifies the implementation since we do
not have to worry about any unaligned accesses. There are a total of
three distinct PMP units in the design. They verify instruction
accesses, data loads and stores, and the page table walk respectively.
MMU Implementation Details
The MMU prioritizes instruction address translations to data address translations. The behavior of the MMU is described in the following:
As soon as a request from the instruction fetch stage arrives, the ITLB checked for a cached entry (combinatorial path). Upon a cache miss, the PTW is invoked.
The PTW will perform the page table walk in multiple cycles. During this walk, the PTW will update the content of the ITLB. The MMU checks every cycle if a cache hit in the ITLB exists, and therefore, the page table walk has concluded.
Multiplier
The multiplier contains a division and multiplication unit. Multiplication is performed in two cycles and is fully pipelined (re-timing needed). The division is a simple serial divider which needs 64 cycles in the worst case.
CSR Buffer
The CSR buffer a functional unit which its only purpose is to store the address of the CSR register the instruction is going to read/write. There are two reasons why we need to do this. The first reason is that an CSR instruction alters the architectural state, hence this instruction has to be buffered and can only be executed as soon as the commit stage decides to commit the instruction. The second reason is the way the scoreboard entry is structured: It has only one result field but for any CSR instruction we need to keep the data we want to write and the address of the CSR which this instruction is going to alter. In order to not clutter the scoreboard with some special case bit fields the CSR buffer comes into play. It simply holds the address and if the CSR instruction is going to execute it will use the stored address.
The clear disadvantage is that with the buffer being just one element we can’t execute more than one CSR instruction back to back without a pipeline stall. Since CSR instructions are quite rare this is not too much of a problem. Some CSR instructions will cause a pipeline flush anyway.
Commit Stage
The commit stage is the last stage in the processor’s pipeline. Its purpose is to take incoming instruction and update the architectural state. This includes writing CSR registers, committing stores and writing back data to the register file. The golden rule is that no other pipeline stage is allowed to update the architectural state under any circumstances. If it keeps an internal state it must be re-settable (e.g.: by a flush signal, see ).
We can distinguish two categories of retiring instructions. The first category just write the architectural register file. The second might as well write the register file but needs some further business logic to happen. At the time of this writing the only two places where this is necessary it the store unit where the commit stage needs to tell the store unit to actually commit the store to memory and the CSR buffer which needs to be freed as soon as the corresponding CSR instruction retires.
In addition to retiring instructions the commit stage also manages the various exception sources. In particular at time of commit exceptions can arise from three different sources. First an exception has occurred in any of the previous four pipeline stages (only four as PC Gen can’t throw an exception). Second an exception happend during commit. The only source where during commit an exception can happen is from the CS register file and from an interrupt.
To allow precise interrupts to happen they are considered during the commit only and associated with this particular instruction. Because we need a particular PC to associate the interrupt with it, it can be the case that an interrupt needs to be deferred until another valid instruction is in the commit stage.
Furthermore commit stage controls the overall stalling of the processor. If the halt signal is asserted it will not commit any new instruction which will generate back-pressure and eventually stall the pipeline. Commit stage also communicates heavily with the controller to execute fence instructions (cache flushes) and other pipeline re-sets.
CVA6 System on Chip (SoC)
Memory Map
Base |
Length |
Attributes |
Description |
---|---|---|---|
0x0000_0000 |
0x1000 |
EX |
Debug Module |
0x0001_0000 |
0x10000 |
EX |
ROM |
0x0200_0000 |
0xC0000 |
CLINT |
|
0x0C00_0000 |
0x400_0000 |
PLIC |
|
0x1000_0000 |
0x1000 |
UART |
|
0x1800_0000 |
0x1000 |
Timer |
|
0x2000_0000 |
0x80_0000 |
SPI |
|
0x3000_0000 |
0x10000 |
Ethernet |
|
0x4000_0000 |
0x1000 |
GPIO |
|
0x8000_0000 |
0x4000_0000 |
EX, NI, C |
DRAM |
(EX: Executable, NI: Non-idempotent, C: Cached)
Platform-Level Interrupt Controller (PLIC)
The specification of CVA6’s platform-level interrupt controller (PLIC) is aligned with the PLIC of SiFive’s FU540-C000. It shares the same functionality and memory map and has the following interrupt sources:
Interrupt ID |
Source |
---|---|
1 |
UART |
2 |
SPI |
3 |
Ethernet |
4 |
Timer 0 (OVF) |
5 |
Timer 0 (CMP) |
6 |
Timer 1 (OVF) |
7 |
Timer 1 (CMP) |
8 – 30 |
Reserved |
CVA6 Testharness
ariane_testharness
is the module where all the masters and slaves have been connected with the axi crossbar.There are two masters and ten slaves in this module.Their names and interfaces have been mentioned in the table below.
| Slaves | Interfaces | Masters | Interfaces || ———– | ———– | ———– | ———– | | DRAM | master[0] | ariane | slave[0] | | GPIO | master[1] | debug | slave[1] | | Ethernet | master[2] | | | | SPI | master[3] | | | | Timer | master[4] | | | | UART | master[5] | | | | PLIC | master[6] | | | | CLINT | master[7] | | | | ROM | master[8] | | | | Debug | master[9] | | |
The following block diagram shows the connections of the slaves and masters in the ariane_testharness
module.
ariane_testharness
Ariane
The ariane
core is instantiated as i_ariane
in ariane_testharness
module. It is acting as a master in ariane_testharness
.The following is the diagram of the ariane
module along with its inputs/outputs ports.
ariane
ipi
, irq
and time_irq
are being sent to this module from the ariane_testharness
module.The AXI request and response signals that are being passed from the ariane_testharness
to ariane
module are the following:
.axi_req_o ( axi_ariane_req ),
.axi_resp_i ( axi_ariane_resp )
In the ariane_testharness
module, axi_ariane_req
and axi_ariane_resp
structs are being linked with the slave[0]
(AXI_BUS interface) in a way that the information of axi_ariane_req
is being passed to the slave[0]
and the information from the slave[0]
is being passed to the axi_ariane_resp
struct. The following compiler directives are being used for this purpose.
AXI_ASSIGN_FROM_REQ(slave[0], axi_ariane_req)
AXI_ASSIGN_TO_RESP(axi_ariane_resp, slave[0])
Rvfi_o
is the output of ariane
and it will go into the rvfi_tracer
module.
Debug
Master
axi_adapter
is acting as a master for the debug module.The following is the diagram of the axi_adapter
module along with its signals.
axi_adapter
The AXI request and response that signals are being passed from the test_harness module are the following:
.axi_req_o ( dm_axi_m_req )
.axi_resp_i ( dm_axi_m_resp )
Slave[1]
is the interface of AXI_BUS and it actually acts as a master for axi_protocol.
The dm_axi_m_req
and dm_axi_m_resp
are being linked with the slave[1] AXI_BUS interface in this way that the requests signals of the dm_axi_m_req
are being passed to the slave[1]
and the response signals from the slave[1]
are being passed to the dm_axi_m_resp
struct.
AXI_ASSIGN_FROM_REQ(slave[1], dm_axi_m_req)
AXI_ASSIGN_TO_RESP(dm_axi_m_resp, slave[1])
Slave
This is the memory of debug and axi2mem
converter is used whenever a read or write request is made to memory by the master.axi2mem
module simply waits for the ar_valid or aw_valid of the master (actual slave) interface and then passes the req_o, we_o, addr_o, be_o, user_o signals and data_o to the memory and will receive the data_i and user_i from the memory.
axi2mem
The memory is has been instantiated in the dm_top
module and the hierarchy is as follows:
dm_top_&_dm_mem
CLINT
Clint is a slave in this SoC. The signals of the clint
module are as follows:
clint
ipi_o
(inter-processing interrupt) and timer_irq_o
(timer_interrupt request) are generated from the clint
module and are the inputs of the ariane core.This module interacts with the axi bus interface through the following assignments:
AXI_ASSIGN_TO_REQ(axi_clint_req, master[ariane_soc::CLINT])
This compiler directive is used to transfer the request signals of the master via the interface mentioned as master[ariane_soc::CLINT]
to the struct axi_clint_req
.
AXI_ASSIGN_FROM_RESP(master[ariane_soc::CLINT], axi_clint_resp)
This compiler directive is used to assign the response of the slave (in this case clint
module) from theAxi_clint_resp
struct to the interface master[ariane_soc::CLINT]
.
Bootrom
axi2mem
module is used to communicate with bootrom
module. The signals of this memory have been shown in the diagram below:
bootrom
Bootrom is pre-initialized with ROM_SIZE = 186
.
SRAM
The complete sequence through which a request to SRAM is transferred is as follows:
sequence
dram
and dram_delayed
are two AXI_BUS interfaces.
The slave modport of AXI_BUS interface for Master[DRAM]
has been linked with axi_riscv_atomics
module and the request of the master has been passed to dram
interface (another instantiation of interface of AXI_BUS). All this is for the exclusive accesses and no burst is supported in this exclusive access.dram
and dram_delayed
interfaces have also been passed to axi_delayer_intf
module as a slave modport and master modport of the AXI_BUS interface, respectively. The axi_delayer_intf
module is used to introduce the delay.dram_delayed
is also passed to the axi2mem
module as a slave modport of AXI_BUS interface. axi2mem
module with dram_delayed
as an AXI_Bus interface will interact with SRAM.SRAM is a word addressable memory with the signals as follows:
sram
GPIO
GPIO is not implemented, error slave has been added in place of it.
UART
There are two signals for the apb_uart
module in the ariane_testharness
, namely tx
and rx
for transmitting and receiving the data.axi2apb_64_32
, module has been used to convert the axi protocol five channel signals to a single channel apb signals. The axi2apb_64_32
module has been used between AXI_BUS and apb_uart module
.The signals of the apb_uart
module have been shown in the diagram below:
apb_uart
Only the signals related to the test_harness have been shown in the above diagram.
PLIC
PLIC is a slave in this SoC. The hiearchy through which the request is propagated to the plic_top module is as follows:
plic_hierarchy
axi2apb_64_32
has been used to convert all the plic axi signals into apb signals.apb_to_reg is used to assign the apb signals to the reg_bus
interface which basically communicates with the plic_top
module. In apb_to_reg
module, the logical AND
of psel
and penable
signals of apb makes the valid
signal of reg_bus
interface.The signals of the plic_top
have been shown below:
plic_top
Timer
The axi2apb_64_32
module has been used to convert all the timer axi signals into timer apb signals.The diagram of the apb_timer is as follows.
apb_timer
The signals of apb protocol have been shown in the form of apb_timer_req
and apb_timer_resp
in the above diagram.
Ethernet
Ethernet is a slave in this testharness.
Ethernet support has not been added in the ‘ariane_testharness’ at this time. For any read or write request from the master to this module is returned with
"ethernet.b_resp = axi_pkg::RESP_SLVERR"
where,
"localparam RESP_SLVERR = 2'b10;" in axi_pkg
which shows "Slave error"
. It is used when the access has reached the slave successfully, but the slave wishes to return an error condition to the originating master.”
SPI
SPI is a slave in this testharness.
Support of the of SPI protocol is present in the SoC, but at this time it is turned off, as the .spi_clk_o ( )
,.spi_mosi ( )
,.spi_miso ( )
,and .spi_ss ( )
signals of SPI have been left open in the ariane_testharness
module. Any read or write request from the master to this module is returned with "Slave error"
.
Indices and tables
Documentation
The documentation is re-generated on pushes to master. When contributing to the project please consider the [contribution guide](https://github.com/openhwgroup/cva6/blob/master/CONTRIBUTING.md).
CVA6 Requirement Specification
Revision 1.0.1
License
Copyright 2022 OpenHW Group and Thales Copyright 2018 ETH Zürich and University of Bologna
SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
Licensed under the Solderpad Hardware License v 2.1 (the “License”); you may not use this file except in compliance with the License, or, at your option, the Apache License version 2.0. You may obtain a copy of the License at https://solderpad.org/licenses/SHL-2.1/. Unless required by applicable law or agreed to in writing, any work distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Introduction
CVA6 is a RISC-V compatible application processor core that can be configured as a 32- or 64-bit core (RV32 or RV64). It includes L1 caches, optional MMU, optional PMP and optional FPU.
It is an industrial evolution of ARIANE created by ETH Zürich and the University of Bologna. It is written in SystemVerilog and maintained by the OpenHW Group.
This specification is organized as requirements that apply to the “Scope of the IP”.
The requirement list is to be approved by the OpenHW Group Technical Work Group (TWG), as well as its change requests.
The specification will be complemented by a user’s guide.
Revision 1.0.0 refers to the product of the first CVA6 project led at OpenHW Group. It is a placeholder in case of future evolutions after project freeze (PF gate).
A list of abbreviations is available at the end of this document.
Scope
Scope of the IP
The scope of the IP is the subsystem that is specified below and that will undergo verification with a 100% coverage goal. In the verification plans, the scope of the IP can be broken down in several DUT (design under test).
The scope of the IP is the CVA6 hardware supporting all the features used in products based on CVA6.
CVA6 exists in two main configurations: CV64A6 and CV32A6. A requirement referring to CVA6 applies to both configurations.
As displayed in the picture above, the IP comprises:
The CVA6 core;
L1 write-through cache;
Optional FPU;
Optional MMU;
Optional PMP;
CSR;
Performance counters;
AXI interface;
Interface with the P-Mesh coherence system of OpenPiton.
These are not part of the IP (several solutions can be used):
CLINT or PLIC Interrupt modules;
Debug module (such as DTM);
Support of L1 write-back cache (this might come later as an update).
In addition to these main configurations, several fine grain parameters are available.
Unless otherwise stated, an optional feature is controlled by a SystemVerilog parameter. If not selected, the optional feature will not be present in the netlist after synthesis.
The reader’s attention is drawn to the difference between an optional feature (“…shall support as an option…”) and a desired goal (“…should support…”, “…should reduce latency…”).
These are not in the scope of this specification:
SW layers, such as compiler and OSes (that can however be part of the OpenHW Group CVA6 project);
SW emulation of RISC-V optional extensions ( feasible but the scope of the IP is the core hardware);
Other features included in the testbench (main memory, firmware, interconnect…), the verification coverage of which will not be measured;
The vector coprocessor (CV-VEC) that is planned to interface with CV64A6.
Initial Release
The CVA6 is highly configurable via SystemVerilog parameters. It is not practical to fully document and verify all possible combinations of parameters, so a set of “viable IP configurations” has been defined. The full list of parameters for this configuration will be detailed in the users’ guide.
Below is the configuration of the first release of the CVA6.
Release ID |
Target |
ISA |
XLEN |
FPU |
CV-X-IF |
MMU |
L1 D$ |
L1 I$ |
---|---|---|---|---|---|---|---|---|
CV32A60X |
ASIC |
IMC |
32 |
No |
Yes |
Sv32 |
None |
16 kB |
Possible Future Releases
Below is a proposed list of configurations that could undergo verification and their main parameters. The full list of parameters for these configurations will be detailed in the users’ guide if and when these configurations are fully verified.
Configuation ID |
Target |
ISA |
XLEN |
FPU |
CV-X-IF |
MMU |
L1 D$ |
L1 I$ |
---|---|---|---|---|---|---|---|---|
cv32a6_imacf_sv32 |
FPGA |
IMACF |
32 |
Yes |
TBD |
Sv32 |
32 kB |
16 kB |
cv32a6_imac_sv32 |
FPGA |
IMAC |
32 |
No |
TBD |
Sv32 |
32 kB |
16 kB |
cv64a6_imacfd_sv39 |
ASIC |
IMACFD |
64 |
Yes |
Yes |
Sv39 |
16 kB |
16 kB |
cv32a6_imac_sv0 |
ASIC |
IMAC |
32 |
No |
Yes |
None |
None |
4 kB |
References
Applicable specifications
To ease the reading, the reference to these specifications can be implicit in the requirements below. For the sake of precision, the requirements identify the versions of RISC-V extensions from these specifications.
[RVunpriv] “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanović, RISC-V Foundation, December 13, 2019.
[RVpriv] “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanović and John Hauser, RISC-V Foundation, December 4, 2021.
[RVdbg] “RISC-V External Debug Support, Document Version 0.13.2”, Editors Tim Newsome and Megan Wachs, RISC-V Foundation, March 22, 2019.
[RVcompat] “RISC-V Architectural Compatibility Test Framework”, https://github.com/riscv-non-isa/riscv-arch-test.
[AXI] AXI Specification, https://developer.arm.com/documentation/ihi0022/hc.
[CV-X-IF] Placeholder for the CV-X-IF coprocessor interface currently prepared at OpenHW Group; current version in https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/.
[OpenPiton] “OpenPiton Microarchitecture Specification”, Princeton University, https://parallel.princeton.edu/openpiton/docs/micro_arch.pdf.
Reference documents
[RVcmo] “RISC-V Base Cache Management Operation ISA Extensions, version 1.0-fd39d01, 2022-01-12”
[CLINT] Core-Local Interruptor (CLINT), “SiFive E31 Core Complex Manual v2p0”, chapter 6, https://static.dev.sifive.com/SiFive-E31-Manual-v2p0.pdf
Functional requirements
General requirement
GEN‑10 |
CVA6 shall be fully compliant with RISC-V specifications [RVunpriv], [RVpriv] and [RVdbg] by implementing all mandatory features for the set of extensions that are selected and by passing [RVcompat] compatibility tests. |
As the RISC-V specification leaves space for variations, this specification specificies some of these variations.
RISC-V standard instructions
To ease tracing to verification, the extensions have been split in independent requirements.
ISA‑10 |
CV64A6 shall support RV64I base instruction set, version 2.1. |
ISA‑20 |
CV32A6 shall support RV32I base instruction set, version 2.1. |
ISA‑30 |
CVA6 shall support the M extension (integer multiply and divide), version 2.0. |
ISA‑40 |
CVA6 shall support the A extension (atomic instructions), version 2.1. |
ISA‑50 |
CV32A6 shall support as an option the F extension (single-precision floating-point), version 2.2. |
ISA‑60 |
CV64A6 shall support as an option the F and D extensions (single- and double-precision floating-point), version 2.2. |
ISA‑70 |
CV64A6 shall support as an option the F extension (single-precision without double-precision floating-point), version 2.2. |
ISA‑80 |
CVA6 shall support as an option the C extension (compressed instructions), version 2.0. |
ISA‑90 |
CVA6 shall support the Zicsr extension (CSR instructions), version 2.0. |
ISA‑100 |
CVA6 shall support the Zifencei extension, version 2.0. |
ISA‑110 |
As an option, the duration
of instructions shall be
independent from the operand
values.
Unlike other options, this one
can be design-time (selected
before compiling the RTL) or
run-time (selected through a
register).
|
Note to ISA-60 and ISA-70: CV64A6 cannot support the D extension with the F extension.
Note to ISA-110: In the current design, the duration of the division is data-dependent, which can be a security issue.
Privileges and virtual memory
The MMU includes a TLB and a hardware PTW.
PVL‑10 |
CVA6 shall support machine, supervisor, user and debug privilege modes. |
PVL‑20 |
CV64A6 shall support as an option the Sv39 virtual memory, version 1.11. |
PVL‑30 |
CV32A6 shall support as an option the Sv32 virtual memory version 1.11. |
PVL‑40 |
CVA6 instances that do not feature virtual memory shall support the Bare mode. |
PVL‑50 |
CVA6 shall feature PMP (physical memory protection) as an option. |
PVL‑60 |
CV64A6 shall support as an option the H extension (hypervisor) version 1.0. |
CSR
There are no requirements related to CSR as they derive from other requirements, such as PVL-10, PVL-60… Details of CSRs will be available in the user’s manual.
Performance counters
Performance counters are important features for safety-critical applications.
HPM‑10 |
CVA6 shall implement the 64-bit
|
HPM‑20 |
CVA6 shall implement as an
option six generic 64-bit
performance counters located in
|
HPM‑30 |
Each of the six generic performance counters shall be able to count events from one of these sources:
|
HPM‑40 |
The source of events counted by
the six generic performance
counters shall be selected by the
|
HPM‑50 |
CVA6 shall allow the supervisor
access of performance counters
through enabling of
|
HPM‑60 |
CVA6 shall allow the user access
of performance counters through
enabling of |
HPM‑70 |
CVA6 shall implement the
|
HPM‑80 |
CVA6 shall implement the
read-only |
The user’s manual will detail the list of counters, events and related controls.
Cache requirements
Caches increase the performance of the processor with regard to memory accesses. Most of their added value for the IP is specified through performance requirements in another section. Here below are specific requirements for these caches.
The project would like to adopt the recently ratified [RVcmo] specification. The analysis yet needs to be performed and will likely lead to an evolution of this specification.
L1 write-through data cache
In the requirements below, L1WTD refers to the L1 write-through data cache that is part of the CVA6.
The first two requirements express the write-through feature. Some requirements are useful for security- and safety-critical applications where a high level of timing predictability is needed.
L1W‑10 |
L1WTD shall reflect all write accesses (stores) by the CVA6 core to the external memory within an upper-bounded number of cycles. The upper-bound is fixed but not specified here. |
L1W‑20 |
L1WTD shall not change the order of write accesses to the external memory with respect to the order of write accesses (stores) received from the CVA6 core. |
L1W‑30 |
L1WTD should offer the following size/ways configurations:
|
L1W‑40 |
L1WTD shall support datasize extension to store EDC, ECC or other information. The numbers of bits of the extension is defined by a compile-time parameter. |
L1W‑50 |
To interface with the P-Mesh coherence system of OpenPiton, L1WTD shall have a line invalidate external command that invalidates the content of a line upon request. |
L1W‑60 |
Some physical memory regions shall be configurable as not L1WTD cacheable at design time. |
L1W‑70 |
It shall be possible to
invalidate L1WTD content with the
|
L1W‑80 |
The replacement policy of L1WTD shall be LFSR (pseudo-random) or LRU (least recently used). |
L1W‑90 |
L1WTD should offer a feature to transform cache ways into a scratchpad. Alternatively, this requirement can be realized with a separate scratchpad. |
L1W‑100 |
A custom CSR shall allow to disable or enable L1WTD. |
Cache counters are defined in the performance counters.
32 kbytes & 4 ways is not feasible with the current architecture. Other size/ways configurations may be implemented in the design.
The design will support one replacement policy allowed by L1W-80.
L1 Instruction cache
In the requirements below, L1I refers to the L1 instruction cache that is part of the CVA6.
Some requirements are useful for security- and safety-critical applications where a high level of timing predictability is needed.
L1I‑10 |
L1I should offer the following size/ways configurations:
|
L1I‑20 |
L1I shall support datasize extension to store EDC, ECC or other information. The numbers of bits of the extension is defined by a compile-time parameter. |
L1I‑30 |
To interface with the P-Mesh coherence system of OpenPiton, L1I shall have a line invalidate external command that invalidates the content of a line upon request. |
L1I‑40 |
It shall be possible to
invalidate L1I content with the
|
L1I‑50 |
The replacement policy of L1I shall be LFSR (pseudo-random) or LRU (least recently used). |
L1I‑60 |
L1I should offer a feature to transform cache ways into a scratchpad. Alternatively, this requirement can be realized with a separate scratchpad. |
L1I‑70 |
A custom CSR shall allow to disable or enable L1I. |
Cache counters are defined in the performance counters section.
32 kbytes & 4 ways is not feasible with the current architecture. Other size/ways configurations may be implemented in the design.
The design will support one replacement policy allowed by L1I-50.
FENCE.T custom instruction
There are discussions within RISC-V International to define a
specification for FENCE.T
. The specification below reflects the
situation prior to this RISC-V specification, based on Nils Wistoff’s
work. If a RISC-V specification is ratified, the CVA6 specification will
likely switch to it.
FET‑10 |
CVA6 shall support the
|
FET‑20 |
|
FENCE.T goes beyond FENCE
and FENCE.I
as it clears L1 caches,
TLB, branch predictors… It is a countermeasure for SPECTRE-like
attacks. It is also useful in safety-critical applications to increase
execution time predictability.
It is not yet decided if the FENCE.T
instruction arguments can be
used to select a subset of microarchitecture features that will be
cleared. The list of arguments, if any, will be detailed in the user’s
guide.
Anticipation of verification: It can be cumbersome to prove the timing decorrelation as expressed in the requirement with digital simulations. We can simulate the microarchitecture features and explain how they satisfy the requirement as Nils Wistoff’s work demonstrated.
PPA targets
These PPA targets will likely be updated when performance monitoring is integrated in the continuous integration flow.
PPA‑10 |
CVA6 should be resource-optimized on FPGA and ASIC targets. |
PPA‑20 |
CVA6 should deliver more than 2.1 CoreMark/MHz. |
PPA‑30 |
CV32A6 should run at more than 150 MHz in the cv32a6_imac_sv32 configuration on Kintex 7 FPGA technology, commercial -2 speed grade. |
PPA‑40 |
CV64A6 should run at more than 900 MHz in the cv64a6_imacfd_sv39 configuration on 28FDSOI technology in the worst case frequency corner with the fastest threshold voltage. |
PPA‑50 |
TBD: Placeholder for single-precision floating performance per MHz. |
PPA‑60 |
TBD: Placeholder for double-precision floating performance per MHz. |
Interface requirements
Memory bus
MEM‑10 |
CVA6 memory interface shall comply with AXI5 specification including the Atomic_Transactions property support as defined in [AXI] section E1.1. |
MEM‑20 |
CVA6 AXI memory interface shall
feature user bit extensions on
the data bus ( |
The interface complies with AXI4. However, Atomic_Transactions is only defined in AXI5. For the sake of clarity, we do not use the AXI5-Lite interface.
Debug
DBG‑10 |
CVA6 shall implement both the Abstracted Command and Execution based features outlined in chapter 4 of [RVdbg]. |
In addition, there can be an external debug module, not in the scope of the IP.
Interrupts
IRQ‑10 |
CVA6 shall implement interrupt handling registers as per the RISC-V privilege specification and interface with a CLINT implementation. |
Coprocessor interface
XIF‑10 |
To extend the supported instructions, CVA6 shall have a coprocessor interface that supports the “Issue”, “Commit” and “Result” interfaces of the [CV-X-IF] specification. |
The goal is to have a compatible interface between CORE-V cores (CVA6, CV32E40X…). The feasibility still needs to be confirmed; including the speculative execution.
CVA6 can interface with several coprocessors simultaneously through a specific external feature implemented on the CV-X-IF interface.
Multi-core interface
TRI‑10 |
CVA6 shall have the Transaction-Response Interface (TRI) needed to interface with the P-Mesh coherence system of OpenPiton, according to [OpenPiton]. |
Design rules
As different teams have different design rules and to ease the integration in FPGA and ASIC design flows:
RUL‑10 |
CVA6 should have a configurable reset signal: synchronous/asynchronous, active on high or low levels. |
RUL‑20 |
CVA6 shall be a super-synchronous design with a single clock input. |
RUL‑30 |
CVA6 should not include multi-cycle paths. |
RUL‑40 |
CVA6 should not include technology-dependent blocks. |
If technology-dependent blocks are used, e.g. to improve PPA on certain targets, the equivalent technology-independent block should be available. Parameters can be used to select between the implementations.
List of abbreviations
CV32A6 Design Document
Introduction
The OpenHW Group uses semantic versioning to describe the release status of its IP. This document describes v0.1.0 of the CV32A6. This is not intended to be a formal release of CVA6. Currently, the first planned release of CVA6 is the CV32A6 v0.2.0.
CVA6 is a 6-stage in-order and single issue processor core which implements the RISC-V instruction set. CVA6 can be configured as a 32- or 64-bit core (RV32 or RV64), called CV32A6 or CV64A6. This document describes an initial version (v0.1.0) of the CV32A6 processor configuration.
The objective of this document is to provide enough information to allow the RTL modification (by designers) and the RTL verification (by verificators). This document is not dedicated to CVA6 users looking for information to develop software like instructions or registers.
The CVA6 architecture is illustrated in the following figure extracted from a paper written by F.Zaruba and L.Benini.

CVA6 Architecture
License
Standards Compliance
To ease the reading, the reference to these specifications can be implicit in the requirements below. For the sake of precision, the requirements identify the versions of RISC-V extensions from these specifications.
[CVA6req] “CVA6 requirement specification”, https://github.com/openhwgroup/cva6/blob/master/docs/specifications/cva6_requirement_specification.rst, HASH#767c465.
[RVunpriv] “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213”, Editors Andrew Waterman and Krste Asanović, RISC-V Foundation, December 13, 2019.
[RVpriv] “The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Document Version 20211203”, Editors Andrew Waterman, Krste Asanović and John Hauser, RISC-V Foundation, December 4, 2021.
[RVdbg] “RISC-V External Debug Support, Document Version 0.13.2”, Editors Tim Newsome and Megan Wachs, RISC-V Foundation, March 22, 2019.
[RVcompat] “RISC-V Architectural Compatibility Test Framework”, https://github.com/riscv-non-isa/riscv-arch-test.
[AXI] AXI Specification, https://developer.arm.com/documentation/ihi0022/hc.
[CV-X-IF] Placeholder for the CV-X-IF coprocessor interface currently prepared at OpenHW Group; current version in https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/.
[OpenPiton] “OpenPiton Microarchitecture Specification”, Princeton University, https://parallel.princeton.edu/openpiton/docs/micro_arch.pdf.
CV32A6 is a standards-compliant 32-bit processor fully compliant with RISC-V specifications: [RVunpriv], [RVpriv] and [RVdbg] and passes [RVcompat] compatibility tests, as requested by [GEN-10] in [CVA6req].
Documentation framework
The framework of this document is inspired by the Common Criteria. The Common Criteria for Information Technology Security Evaluation (referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for computer security certification.
Description of the framework:
Processor is split into module corresponding to the main modules of the design
Modules can contain several modules
Each module is described in a chapter, which contains the following subchapters: Description, Functionalities, Architecture and Modules and Registers (if any)
The subchapter Description describes the main features of the submodule, the interconnections between the current module and the others and the inputs/outputs interface.
The subchapter Functionality lists in details the module functionalities. Please avoid using the RTL signal names to explain the functionalities.
The subchapter Architecture and Modules provides a drawing to present the module hierarchy, then the functionalities covered by the module
The subchapter Registers specifies the module registers if any
Contributors
[TO BE COMPLETED]
CV32A6 Subsystem
The CV32A6 v0.1.0 is a subsystem composed of the modules and protocol interfaces as illustrated CV32A6 v0.1.0 modules The processor is a Harvard-based modern architecture. Instructions are issued in-order through the DECODE stage and executed out-of-order but committed in-order. The processor is Single issue, that means that at maximum one instruction per cycle can be issued to the EXECUTE stage.
The CV32A6 implements a 6-stage pipeline composed of PC Generation, Instruction Detch, Instruction Decode, Issue stage, Execute stage and Commit stage. At least 6 cycles are needed to execute one instruction.
Instantiation
Parameter |
Type |
Value |
Description |
---|---|---|---|
|
ariane_pkg::ariane_cfg_t |
ariane_pkg::v0.1.0_Config |
CVA6 v0.1.0 configuration |
Signal |
IO |
Type |
Description |
---|---|---|---|
|
in |
logic |
subsystem clock |
|
in |
logic |
Asynchronous reset active low |
|
in |
logic[VLEN-1:0] |
Reset boot address |
|
in |
logic[XLEN-1:0] |
Hart id in a multicore environment (reflected in a CSR) |
|
in |
logic[1:0] |
Level sensitive IR lines, mip & sip (async) |
|
in |
logic |
Inter-processor interrupts (async) |
|
in |
logic |
Timer interrupt in (async) |
|
in |
logic |
Debug request (async) |
|
out |
trace_port_t |
RISC-V Formal Interface port (RVFI) |
|
out |
cvxif_req_t |
Coprocessor Interface request interface port (CV-X-IF) |
|
in |
cvxif_resp_t |
Coprocessor Interface response interface port (CV-X-IF) |
|
out |
req_t |
AXI master request interface port |
|
in |
resp_t |
AXI master response interface port |
Functionality
CV32A6 v0.1.0 implements a configuration which allows to connect coprocessor through CV-X-IF coprocessor interface, but the lack of MMU, A extension and data cache prevent from executing Linux.
Standard Extension |
Specification |
Configurability |
---|---|---|
I: RV32i Base Integer Instruction Set |
[RVunpriv] |
ON |
C: Standard Extension for Compressed Instructions |
[RVunpriv] |
ON |
M: Standard Extension for Integer Multiplication and Division |
[RVunpriv] |
ON |
A: Standard Extension for Atomic transaction |
[RVunpriv] |
OFF |
F and D: Single and Double Precision Floating-Point |
[RVunpriv] |
OFF |
Zicount: Performance Counters |
[RVunpriv] |
OFF |
Zicsr: Control and Status Register Instructions |
[RVpriv] |
ON |
Zifencei: Instruction-Fetch Fence |
[RVunpriv] |
ON |
Privilege: Standard privilege modes M, S and U |
[RVpriv] |
ON |
SV39, SV32, SV0: MMU capability |
[RVpriv] |
OFF |
PMP: Memory Protection Unit |
[RVpriv] |
OFF |
CSR: Control and Status Registers |
[RVpriv] |
ON |
AXI: AXI interface |
[CV-X-IF] |
ON |
TRI: Translation Response Interface (TRI) |
[OpenPiton] |
OFF |
Micro-architecture |
Specification |
Configurability |
---|---|---|
I$: Instruction cache |
current spec |
ON |
D$: Data cache |
current spec |
OFF |
Rename: register Renaming |
current spec |
OFF |
Double Commit: out of order pipeline execute stage |
current spec |
ON |
BP: Branch Prediction |
current spec |
ON with no info storage |
CVA6 memory interface complies with AXI5 specification including the Atomic_Transactions property support as defined in [AXI] section E1.1.
CVA6 coprocessor interface complies with CV-X-IF protocol specification as defined in [CV-X-IF].
The CV32A6 v0.1.0 core is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.
For ASIC synthesis, the whole design is completely synchronous and uses positive-edge triggered flip-flops. The core occupies an area of about 80 kGE. The clock frequency can be more than 1GHz depending of technology.
Architecture and Modules
The CV32A6 v0.1.0 subsystem is composed of 8 modules.

CV32A6 v0.1.0 modules
Connections between modules are illustrated in the following block diagram. FRONTEND, DECODE, ISSUE, EXECUTE, COMMIT and CONTROLLER are part of the pipeline. And CACHES implements the instruction and data caches and CSRFILE contains registers.

CV32A6 v0.1.0 pipeline and modules
FRONTEND Module
Description
The FRONTEND module implements two first stages of the cva6 pipeline, PC gen and Fetch stages.
PC gen stage is responsible for generating the next program counter hosting a Branch Target Buffer (BTB) a Branch History Table (BHT) and a Return Address Stack (RAS) to speculate on the branch target address.
Fetch stage requests data to the CACHE module, realigns the data to store them in instruction queue and transmits the instructions to the DECODE module. FRONTEND can fetch up to 2 instructions per cycles when C extension instructions is used, but as instruction queue limits the data rate, up to one instruction per cycle can be sent to DECODE.
The module is connected to:
CACHES module provides fethed instructions to FRONTEND.
DECODE module receives instructions from FRONTEND.
CONTROLLER module can flush FRONTEND PC gen stage
EXECUTE, CONTROLLER, CSR and COMMIT modules triggers PC jumping due to a branch mispredict, an exception, a return from exception, a debug entry or pipeline flush. They provides related PC next value.
CSR module states about debug mode.
Signal |
IO |
connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subsystem Clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
CSR |
logic |
Debug mode state |
|
in |
CONTROLLER |
logic |
Fetch flush request |
|
in |
tied at zero |
logic |
flush branch prediction |
|
in |
SUBSYSTEM |
logic[VLEN-1:0] |
Next PC when reset |
|
in |
EXECUTE |
bp_resolve_t |
mispredict event and next PC |
|
in |
CSR |
logic |
Return from exception event |
|
in |
CSR |
logic[VLEN-1:0] |
Next PC when returning from exception |
|
in |
COMMIT |
logic |
Exception event |
|
in |
CSR |
logic[VLEN-1:0] |
Next PC when jumping into exception |
|
in |
CONTROLLER |
logic |
Set the PC coming from COMMIT as next PC |
|
in |
COMMIT |
logic[VLEN-1:0] |
Next PC when flushing pipeline |
|
in |
CSR |
logic |
Debug event |
|
out |
CACHES |
icache_dreq_i_t |
Handshake between CACHE and FRONTEND (fetch) |
|
in |
CACHES |
icache_dreq_o_t |
Handshake between CACHE and FRONTEND (fetch) |
|
out |
DECODE |
fetch_entry_t |
Handshake’s data between FRONTEND (fetch) and DECODE |
|
out |
DECODE |
logic |
Handshake’s valid between FRONTEND (fetch) and DECODE |
|
in |
DECODE |
logic |
Handshake’s ready between FRONTEND (fetch) and DECODE |
Functionality
PC Generation stage
PC gen generates the next program counter. The next PC can originate from the following sources (listed in order of precedence):
Reset state: At reset, the PC is assigned to the boot address.
Branch Predict: Fetched instruction is predecoded thanks to instr_scan submodule. When instruction is a control flow, three cases need to be considered:
If instruction is a JALR and BTB (Branch Target Buffer) returns a valid address, next PC is predicted by BTB. Else JALR is not considered as a control flow instruction, which will generate a mispredict.
If instruction is a branch and BTH (Branch History table) returns a valid address, next PC is predicted by BHT. Else branch is not considered as an control flow instruction, which will generate a mispredict when branch is taken.
If instruction is a RET and RAS (Return Address Stack) returns a valid address and RET has already been consummed by instruction queue. Else RET is considered as a control flow instruction but next PC is not predicted. A mispredict wil be generated.
Then the PC gen informs the Fetch stage that it performed a prediction on the PC. In CV32A6 v0.1.0, Branch Prediction is simplified: no information is stored in BTB, BHT and RAS. JALR, branch and RET instructions are not considered as control flow instruction and will generates mispredict.
Default: PC + 4 is fetched. PC Gen always fetches on a word boundary (32-bit). Compressed instructions are handled by fetch stage.
Mispredict: When a branch prediction is mispredicted, the EXECUTE feedbacks a misprediction. This can either be a ‘real’ mis-prediction or a branch which was not recognized as one. In any case we need to correct our action and start fetching from the correct address.
Replay instruction fetch: When the instruction queue is full, the instr_queue submodule asks the fetch replay and provides the address to be replayed.
Return from environment call: When CSR asks a return from an environment call, the PC is assigned to the successive PC to the one stored in the CSR [m-s]epc register.
Exception/Interrupt: If an exception (or interrupt, which is in the context of RISC-V subsystems quite similar) is triggered by the COMMIT, the next PC Gen is assigned to the CSR trap vector base address. The trap vector base address can be different depending on whether the exception traps to S-Mode or M-Mode (user mode exceptions are currently not supported). It is the purpose of the CSR Unit to figure out where to trap to and present the correct address to PC Gen.
Pipeline Flush: When a CSR with side-effects gets written the whole pipeline is flushed by CONTROLLER and FRONTEND starts fetching from the next instruction again in order to take the up-dated information into account (for example virtual memory base pointer changes). The PC related to the flush action is provided by the COMMIT. Moreover flush is also transmitted to the CACHES through the next fetch CACHES access and instruction queue is reset.
Debug: Debug has the highest order of precedence as it can interrupt any control flow requests. It also the only source of control flow change which can actually happen simultaneously to any other of the forced control flow changes. The debug jump is requested by CSR. The address to be jumped into is HW coded. This debug feature is not supported by CV32A6 v0.1.0.
All program counters are logical addressed. If the logical to physical mapping changes a fence.vm instruction should used to flush the pipeline and TLBs (MMU is not enabled in CV32A6 v0.1.0).
Fetch Stage
Fetch stage controls by handshake protocol the CACHE module. Fetched data are 32-bit block with word aligned address. A granted fetch is realigned into instr_realign submodule to produce instructions. Then instructions are pushed into an internal instruction FIFO called instruction queue (instr_queue submodule). This submodule stores the instructions and related information which allow to identify the outstanding transactions. In the case CONTROLLER decides to flush the instruction queue, the outstanding transactions are discarded.
The Fetch stage asks the MMU (MMU is not enabled in CV32A6 v0.1.0) to translate the requested address.
Memory and MMU (MMU is not enabled in CV32A6 v0.1.0) can feedback potential exceptions generated by the memory fetch request. They can be bus errors, invalid accesses or instruction page faults.
Architecture and Submodules

FRONTEND submodules
Instr_realign submodule
Signal |
IO |
connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subystem Clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
FRONTEND |
logic |
Instr_align Flush |
|
in |
CACHES (reg) |
logic |
32-bit block is valid |
|
in |
CACHES (reg) |
logic[VLEN-1:0] |
32-bit block address |
|
in |
CACHES (reg) |
logic[31:0] |
32-bit block |
|
out |
FRONTEND |
logic[1:0] |
instruction is valid |
|
out |
FRONTEND |
logic[1:0][VLEN-1:0] |
Instruction address |
|
out |
instr_scan, instr_queue |
logic[1:0][31:0] |
Instruction |
|
out |
FRONTEND |
logic |
Instruction is unaligned |
The 32-bit aligned block coming from the CACHE module enters the instr_realign submodule. This submodule extracts the instructions from the 32-bit blocks, up to two instructions because it is possible to fetch two instructions when C extension is used. If the instructions are not compressed, it is possible that the instruction is not aligned on the block size but rather interleaved with two cache blocks. In that case, two cache accesses are needed. The instr_realign submodule provides at maximum one instruction per cycle. Not complete instruction is stored in instr_realign submodule before being provided in the next cycles.
In case of mispredict, flush, replay or branch predict, the instr_realign is re-initialized, the internal register storing the instruction alignment state is reset.
Instr_queue submodule
Signal |
IO |
connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subystem Clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
CONTROLLER |
logic |
Fetch flush request |
|
in |
instr_realign |
logic[1:0] |
Instruction is valid |
|
in |
instr_realign |
logic[1:0][31:0] |
Instruction |
|
in |
instr_realign |
logic[1:0][VLEN-1:0] |
Instruction address |
|
in |
FRONTEND |
logic[VLEN-1:0] |
Instruction predict address |
|
in |
FRONTEND |
logic[1:0] |
Instruction control flow type |
|
out |
CACHES |
logic |
Handshake’s ready between CACHE and FRONTEND (fetch stage) |
|
out |
FRONTEND |
logic[1:0] |
Indicates instructions consummed, that is to say popped by DECODE |
|
in |
CACHES (reg) |
logic |
Exception |
|
in |
CACHES (reg) |
logic[VLEN-1:0] |
Exception address |
|
out |
FRONTEND |
logic |
Replay instruction because one of the FIFO was already full |
|
out |
FRONTEND |
logic[VLEN-1:0] |
Address at which to replay the fetch |
|
out |
DECODE |
fetch_entry_t |
Handshake’s data between FRONTEND (fetch stage) and DECODE |
|
out |
DECODE |
logic |
Handshake’s valid between FRONTEND (fetch stage) and DECODE |
|
in |
DECODE |
logic |
Handshake’s ready between FRONTEND (fetch stage) and DECODE |
The instr_queue receives 32bit block from CACHES to create a valid stream of instructions to be decoded (by DECODE), to be issued (by ISSUE) and executed (by EXECUTE). FRONTEND pushes in FIFO to store the instructions and related information needed in case of mispredict or exception: instructions, instruction control flow type, exception, exception address and predicted address. DECODE pops them when decode stage is ready and indicates to the FRONTEND the instruction has been consummed.
The instruction queue contains max 4 instructions.
In instruction queue, exception can only correspond to page-fault exception.
If the instruction queue is full, a replay request is sent to inform the fetch mechanism to replay the fetch.
The instruction queue can be flushed by CONTROLLER.
Instr_scan submodule
Signal |
IO |
Connection |
Type |
Description |
---|---|---|---|---|
|
in |
instr_realign |
logic[31:0] |
Instruction to be predecoded |
|
out |
FRONTEND |
logic |
Return instruction |
|
out |
FRONTEND |
logic |
JAL instruction |
|
out |
FRONTEND |
logic |
Branch instruction |
|
out |
FRONTEND |
logic |
JALR instruction |
|
out |
FRONTEND |
logic |
unconditional jump instruction |
|
out |
FRONTEND |
logic[VLEN-1:0] |
Instruction immediat |
|
out |
FRONTEND |
logic |
Branch compressed instruction |
|
out |
FRONTEND |
logic |
unconditional jump compressed instruction |
|
out |
FRONTEND |
logic |
JR compressed instruction |
|
out |
FRONTEND |
logic |
Return compressed instruction |
|
out |
FRONTEND |
logic |
JALR compressed instruction |
|
out |
FRONTEND |
logic |
JAL compressed instruction |
|
out |
FRONTEND |
logic[VLEN-1:0] |
Instruction compressed immediat |
The instr_scan submodule pre-decodes the fetched instructions, instructions could be compressed or not. The outputs are used by the branch prediction feature. The instr_scan submodule tells if the instruction is compressed and provides the intruction type: branch, jump, return, jalr, imm, call or others.
BHT (Branch History Table) submodule
Signal |
IO |
Connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subystem clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
tied at zero |
logic |
Flush request |
|
in |
CSR |
logic |
Debug mode state |
|
in |
CACHES (reg) |
logic[VLEN-1:0] |
Virtual PC |
|
in |
EXECUTE |
bht_update_t |
Update btb with resolved address |
|
out |
FRONTEND |
bht_prediction_t |
Prediction from bht |
When a branch instruction is resolved by the EXECUTE, the relative information is stored in the Branch History Table.
The information is stored in a 1024 entry table.
The Branch History table is a two-bit saturation counter that takes the virtual address of the current fetched instruction by the CACHE. It states whether the current branch request should be taken or not. The two bit counter is updated by the successive execution of the current instructions as shown in the following figure.

BHT saturation
The BHT is not updated if processor is in debug mode.
When a branch instruction is pre-decoded by instr_scan submodule, the BHT informs whether the PC address is in the BHT. In this case, the BHT predicts whether the branch is taken and provides the corresponding target address.
The BTB is never flushed.
BTB (Branch Target Buffer) submodule
Signal |
IO |
Connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subystem clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
tied at zero |
logic |
Flush request state |
|
in |
CSR |
logic |
Debug mode |
|
in |
CACHES (reg) |
logic |
Virtual PC |
|
in |
EXECUTE |
btb_update_t |
Update BTB with resolved address |
|
out |
FRONTEND |
btb_prediction_t |
BTB Prediction |
When a unconditional jumps to a register (JALR instruction) is mispredicted by the EXECUTE, the relative information is stored into the BTB, that is to say the JALR PC and the target address.
The information is stored in a 8 entry table.
The BTB is not updated if processor is in debug mode.
When a branch instruction is pre-decoded by instr_scan submodule, the BTB informs whether the input PC address is in BTB. In this case, the BTB provides the corresponding target address.
The BTB is never flushed.
RAS (Return Address Stack) submodule
Signal |
IO |
Connection |
Type |
Description |
---|---|---|---|---|
|
in |
SUBSYSTEM |
logic |
Subystem clock |
|
in |
SUBSYSTEM |
logic |
Asynchronous reset active low |
|
in |
tied at zero |
logic |
Flush request |
|
in |
FRONTEND |
logic |
Push address in RAS |
|
in |
FRONTEND |
logic |
Pop address from RAS |
|
in |
FRONTEND |
logic[VLEN-1:0] |
Data to be pushed |
|
out |
FRONTEND |
ras_t |
Popped data |
When an unconditional jumps to a known target address (JAL instruction) is consummed by the instr_queue, the next pc after the JAL instruction and the return address are stored into a FIFO.
The RAS FIFO depth is 2.
When a branch instruction is pre-decoded by instr_scan submodule, the RAS informs whether the input PC address is in RAS. In this case, the RAS provides the corresponding target address.
The RAS is never flushed.
RV32 Instructions
Introduction
In this document, we present ISA (Instruction Set Architecture) for C32VA6_v0.1.0, illustrating different supported instructions, the Base Integer Instruction set RV32I, and also other instructions in some extensions supported by the core as:
RV32M – Standard Extension for Integer Multiplication and Division Instructions
RV32A – Standard Extension for Atomic Instructions
RV32C – Standard Extension for Compressed Instructions
RV32Zicsr – Standard Extension for CSR Instructions
RV32Zifencei – Standard Extension for Instruction-Fetch Fence
The base RISC-V ISA has fixed-length 32-bit instructions or 16-bit instructions (the C32VA6_v0.1.0 support C extension), so that must be naturally aligned on 4-byte boundary or 2-byte boundary. The C32VA6_v0.1.0 supports:
Only 1 hart,
Misaligned accesses to the memory.
General purpose registers
As shown in the Table 1.1, There are 31 general-purpose registers x1–x31, which hold integer values. Register x0 is hardwired to the constant 0. There is no hardwired subroutine return address link register, but the standard software calling convention uses register x1 to hold the return address on a call. For C32VA6_v0.1.0, the x registers are 32 bits wide. There is one additional register also 32 bits wide: the program counter pc holds the address of the current instruction.
Table 1.1 shows the general-purpose registers :
5-bit Encoding (rx) |
3-bit Compressed Encoding (rx’) |
Register (ISA name) |
Register (ABI name) |
Description |
---|---|---|---|---|
0 |
x0 |
zero |
Hardwired zero |
|
1 |
x1 |
ra |
Return address |
|
2 |
x2 |
sp |
Stack pointer |
|
3 |
x3 |
gp |
Global pointer |
|
4 |
x4 |
tp |
Thread pointer |
|
5 |
x5 |
t0 |
Temporaries/alternate link register |
|
6 - 7 |
x6 - x7 |
t1 - t2 |
Temporaries |
|
8 |
0 |
x8 |
s0/fp |
Saved register/frame pointer |
9 |
1 |
x9 |
s1 |
Saved registers |
10 - 11 |
2 - 3 |
x10 - x11 |
a0 - a1 |
Function arguments/return value |
12 - 15 |
4 - 7 |
x12 - x15 |
a2 - a5 |
Function arguments |
16 - 17 |
x16 - x17 |
a6 - a7 |
Function arguments |
|
18 - 27 |
x18 - x27 |
s2 - s11 |
Saved registers |
|
28 - 31 |
x28 - x31 |
t3 - t6 |
Temporaries |
RV32I Base Integer Instruction Set
This section describes the RV32I base integer instruction set.
Integer Register-Immediate Instructions
ADDI: Add Immediate
Format: addi rd, rs1, imm[11:0]
Description: add sign-extended 12-bit immediate to register rs1, and store the result in register rd.
Pseudocode: x[rd] = x[rs1] + sext(imm[11:0])
Invalid values: NONE
Exception raised: NONE
ANDI: AND Immediate
Format: andi rd, rs1, imm[11:0]
Description: perform bitwise AND on register rs1 and the sign-extended 12-bit immediate and place the result in rd.
Pseudocode: x[rd] = x[rs1] & sext(imm[11:0])
Invalid values: NONE
Exception raised: NONE
ORI: OR Immediate
Format: ori rd, rs1, imm[11:0]
Description: perform bitwise OR on register rs1 and the sign-extended 12-bit immediate and place the result in rd.
Pseudocode: x[rd] = x[rs1] | sext(imm[11:0])
Invalid values: NONE
Exception raised: NONE
XORI: XOR Immediate
Format: xori rd, rs1, imm[11:0]
Description: perform bitwise XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd.
Pseudocode: x[rd] = x[rs1] ^ sext(imm[11:0])
Invalid values: NONE
Exception raised: NONE
SLTI: Set Less Then Immediate
Format: slti rd, rs1, imm[11:0]
Description: set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as signed numbers, else 0 is written to rd.
Pseudocode: if (x[rs1] < sext(imm[11:0]) x[rd] = 1 else x[rd] = 0
Invalid values: NONE
Exception raised: NONE
SLTIU: Set Less Then Immediate Unsigned
Format: sltiu rd, rs1, imm[11:0]
Description: set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as unsigned numbers, else 0 is written to rd.
Pseudocode: if (x[rs1] <u sext(imm[11:0]) x[rd] = 1 else x[rd] = 0
Invalid values: NONE
Exception raised: NONE
SLLI: Shift Left Logic Immediate
Format: slli rd, rs1, imm[4:0]
Description: logical left shift (zeros are shifted into the lower bits).
Pseudocode: x[rd] = x[rs1] << imm[4:0]
Invalid values: NONE
Exception raised: NONE
SRLI: Shift Right Logic Immediate
Format: srli rd, rs1, imm[4:0]
Description: logical right shift (zeros are shifted into the upper bits).
Pseudocode: x[rd] = x[rs1] >> imm[4:0]
Invalid values: NONE
Exception raised: NONE
SRAI: Shift Right Arithmetic Immediate
Format: srai rd, rs1, imm[4:0]
Description: arithmetic right shift (the original sign bit is copied into the vacated upper bits).
Pseudocode: x[rd] = x[rs1] >>s imm[4:0]
Invalid values: NONE
Exception raised: NONE
LUI: Load Upper Immediate
Format: lui rd, imm[19:0]
Description: place the immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros.
Pseudocode: x[rd] = sext(imm[31:12] << 12)
Invalid values: NONE
Exception raised: NONE
AUIPC: Add Upper Immediate to PC
Format: auipc rd, imm[19:0]
Description: form a 32-bit offset from the 20-bit immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then place the result in register rd.
Pseudocode: x[rd] = pc + sext(immediate[31:12] << 12)
Invalid values: NONE
Exception raised: NONE
Integer Register-Register Instructions
ADD: Addition
Format: add rd, rs1, rs2
Description: add rs2 to register rs1, and store the result in register rd.
Pseudocode: x[rd] = x[rs1] + x[rs2]
Invalid values: NONE
Exception raised: NONE
SUB: Subtraction
Format: sub rd, rs1, rs2
Description: subtract rs2 from register rs1, and store the result in register rd.
Pseudocode: x[rd] = x[rs1] - x[rs2]
Invalid values: NONE
Exception raised: NONE
AND: AND logical operator
Format: and rd, rs1, rs2
Description: perform bitwise AND on register rs1 and rs2 and place the result in rd.
Pseudocode: x[rd] = x[rs1] & x[rs2]
Invalid values: NONE
Exception raised: NONE
OR: OR logical operator
Format: or rd, rs1, rs2
Description: perform bitwise OR on register rs1 and rs2 and place the result in rd.
Pseudocode: x[rd] = x[rs1] | x[rs2]
Invalid values: NONE
Exception raised: NONE
XOR: XOR logical operator
Format: xor rd, rs1, rs2
Description: perform bitwise XOR on register rs1 and rs2 and place the result in rd.
Pseudocode: x[rd] = x[rs1] ^ x[rs2]
Invalid values: NONE
Exception raised: NONE
SLT: Set Less Then
Format: slt rd, rs1, rs2
Description: set register rd to 1 if register rs1 is less than rs2 when both are treated as signed numbers, else 0 is written to rd.
Pseudocode: if (x[rs1] < x[rs2]) x[rd] = 1 else x[rd] = 0
Invalid values: NONE
Exception raised: NONE
SLTU: Set Less Then Unsigned
Format: sltu rd, rs1, rs2
Description: set register rd to 1 if register rs1 is less than rs2 when both are treated as unsigned numbers, else 0 is written to rd.
Pseudocode: if (x[rs1] <u x[rs2]) x[rd] = 1 else x[rd] = 0
Invalid values: NONE
Exception raised: NONE
SLL: Shift Left Logic
Format: sll rd, rs1, rs2
Description: logical left shift (zeros are shifted into the lower bits).
Pseudocode: x[rd] = x[rs1] << x[rs2]
Invalid values: NONE
Exception raised: NONE
SRL: Shift Right Logic
Format: srl rd, rs1, rs2
Description: logical right shift (zeros are shifted into the upper bits).
Pseudocode: x[rd] = x[rs1] >> x[rs2]
Invalid values: NONE
Exception raised: NONE
SRA: Shift Right Arithmetic
Format: sra rd, rs1, rs2
Description: arithmetic right shift (the original sign bit is copied into the vacated upper bits).
Pseudocode: x[rd] = x[rs1] >>s x[rs2]
Invalid values: NONE
Exception raised: NONE
Control Transfer Instructions
Unconditional Jumps
JAL: Jump and Link
Format: jal rd, imm[20:1]
Description: offset is sign-extended and added to the pc to form the jump target address (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd.
Pseudocode: x[rd] = pc+4; pc += sext(imm[20:1])
Invalid values: NONE
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception. An exception is raised on taken branch or unconditional jump if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.
JALR: Jump and Link Register
Format: jalr rd, rs1, imm[11:0]
Description: target address is obtained by adding the 12-bit signed immediate to the register rs1 (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd.
Pseudocode: t = pc+4; pc = (x[rs1]+sext(imm[11:0]))&∼1 ; x[rd] = t
Invalid values: NONE
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception. An exception is raised on taken branch or unconditional jump if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.
Conditional Branches
BEQ: Branch Equal
Format: beq rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are equal.
Invalid values: NONE
Pseudocode: if (x[rs1] == x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
BNE: Branch Not Equal
Format: bne rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are not equal.
Invalid values: NONE
Pseudocode: if (x[rs1] != x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
BLT: Branch Less Than
Format: blt rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using signed comparison).
Invalid values: NONE
Pseudocode: if (x[rs1] < x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
BLTU: Branch Less Than Unsigned
Format: bltu rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using unsigned comparison).
Invalid values: NONE
Pseudocode: if (x[rs1] <u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
BGE: Branch Greater or Equal
Format: bge rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using signed comparison).
Pseudocode: if (x[rs1] >= x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Invalid values: NONE
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
BGEU: Branch Greater or Equal Unsigned
Format: bgeu rs1, rs2, imm[12:1]
Description: takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using unsigned comparison).
Pseudocode: if (x[rs1] >=u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4
Exception raised: no instruction fetch misaligned exception is generated for a conditional branch that is not taken.
Load and Store Instructions
LB: Load Byte
Format: lb rd, imm(rs1)
Description: loads a 8-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: x[rd] = sext(M[x[rs1] + sext(imm[11:0])][7:0])
Invalid values: NONE
Exception raised: loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded.
LH: Load Halfword
Format: lh rd, imm(rs1)
Description: loads a 16-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: x[rd] = sext(M[x[rs1] + sext(imm[11:0])][15:0])
Invalid values: NONE
Exception raised: loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary).
LW: Load Word
Format: lw rd, imm(rs1)
Description: loads a 32-bit value from memory, then storing in rd (rd is calculated using signed arithmetic). The effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Invalid values: NONE
Pseudocode: x[rd] = sext(M[x[rs1] + sext(imm[11:0])][31:0])
Exception raised: loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (4-byte boundary).
LBU: Load Byte Unsigned
Format: lbu rd, imm(rs1)
Description: loads a 8-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: x[rd] = zext(M[x[rs1] + sext(imm[11:0])][7:0])
Invalid values: NONE
Exception raised: loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded.
LHU: Load Halfword Unsigned
Format: lhu rd, imm(rs1)
Description: loads a 16-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: x[rd] = zext(M[x[rs1] + sext(imm[11:0])][15:0])
Invalid values: NONE
Exception raised: loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary).
SB: Store Byte
Format: sb rs2, imm(rs1)
Description: stores a 8-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: M[x[rs1] + sext(imm[11:0])][7:0] = x[rs2][7:0]
Invalid values: NONE
Exception raised: NONE
SH: Store Halfword
Format: sh rs2, imm(rs1)
Description: stores a 16-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: M[x[rs1] + sext(imm[11:0])][15:0] = x[rs2][15:0]
Invalid values: NONE
Exception raised: an exception is raised if the memory address isn’t aligned (2-byte boundary).
SW: Store Word
Format: sw rs2, imm(rs1)
Description: stores a 32-bit value from register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.
Pseudocode: M[x[rs1] + sext(imm[11:0])][31:0] = x[rs2][31:0]
Invalid values: NONE
Exception raised: an exception is raised if the memory address isn’t aligned (4-byte boundary).
Memory Ordering
FENCE: Fence Instruction
Format: fence pre, succ
Description: order device I/O and memory accesses as viewed by other RISC-V harts and external devices or coprocessors. Any combination of device input (I), device output (O), memory reads (R), and memory writes (W) may be ordered with respect to any combination of the same. Informally, no other RISC-V hart or external device can observe any operation in the successor set following a FENCE before any operation in the predecessor set preceding the FENCE, as the core support 1 hart, the fence instruction has no effect so we can considerate it as a nop instruction.
Pseudocode: No operation (nop)
Invalid values: NONE
Exception raised: NONE
Environment Call and Breakpoints
ECALL: Environment Call
Format: ecall
Description: make a request to the supporting execution environment, which is usually an operating system. The ABI for the system will define how parameters for the environment request are passed, but usually these will be in defined locations in the integer register file.
Pseudocode: RaiseException(EnvironmentCall)
Invalid values: NONE
Exception raised: Raise an Environment Call exception.
EBREAK:Environment Break
Format: ebreak
Description: cause control to be transferred back to a debugging environment.
Pseudocode: RaiseException(Breakpoint)
Invalid values: NONE
Exception raised: Raise a Breakpoint exception.
RV32M Multiplication and Division Instructions
This chapter describes the standard integer multiplication and division instruction extension, which is named “M” and contains instructions that multiply or divide values held in two integer registers.
Multiplication Operations
MUL: Multiplication
Format: mul rd, rs1, rs2
Description: performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both rs1 and rs2 treated as signed numbers).
Pseudocode: x[rd] = x[rs1] * x[rs2]
Invalid values: NONE
Exception raised: NONE
MULH: Multiplication Higher
Format: mulh rd, rs1, rs2
Description: performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as signed numbers).
Pseudocode: x[rd] = (x[rs1] s*s x[rs2]) >>s 32
Invalid values: NONE
Exception raised: NONE
MULHU: Multiplication Higher Unsigned
Format: mulhu rd, rs1, rs2
Description: performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as unsigned numbers).
Pseudocode: x[rd] = (x[rs1] u*u x[rs2]) >>u 32
Invalid values: NONE
Exception raised: NONE
MULHSU: Multiplication Higher Signed Unsigned
Format: mulhsu rd, rs1, rs2
Description: performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (rs1 treated as signed number, rs2 treated as unsigned number).
Pseudocode: x[rd] = (x[rs1] s*u x[rs2]) >>s 32
Invalid values: NONE
Exception raised: NONE
Division Operations
DIV: Division
Format: div rd, rs1, rs2
Description: perform signed integer division of 32 bits by 32 bits (rounding towards zero).
Pseudocode: x[rd] = x[rs1] /s x[rs2]
Invalid values: NONE
Exception raised: NONE
DIVU: Division Unsigned
Format: divu rd, rs1, rs2
Description: perform unsigned integer division of 32 bits by 32 bits (rounding towards zero).
Pseudocode: x[rd] = x[rs1] /u x[rs2]
Invalid values: NONE
Exception raised: NONE
REM: Remain
Format: rem rd, rs1, rs2
Description: provide the remainder of the corresponding division operation DIV (the sign of rd equals the sign of rs1).
Pseudocode: x[rd] = x[rs1] %s x[rs2]
Invalid values: NONE
Exception raised: NONE
REMU: Remain Unsigned
Format: rem rd, rs1, rs2
Description: provide the remainder of the corresponding division operation DIVU.
Pseudocode: x[rd] = x[rs1] %u x[rs2]
Invalid values: NONE
Exception raised: NONE
RV32A Atomic Instructions
The standard atomic instruction extension is denoted by instruction subset name “A”, and contains instructions that atomically read-modify-write memory to support synchronization between multiple RISC-V harts running in the same memory space. The two forms of atomic instruction provided are load-reserved/store-conditional instructions and atomic fetch-and-op memory instructions. Both types of atomic instruction support various memory consistency orderings including unordered, acquire, release, and sequentially consistent semantics.
Load-Reserved/Store-Conditional Instructions
LR.W: Load-Reserved Word
Format: lr.w rd, (rs1)
Description: LR loads a word from the address in rs1, places the sign-extended value in rd, and registers a reservation on the memory address.
Pseudocode: x[rd] = LoadReserved32(M[x[rs1]])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
LR.W: Store-Conditional Word
Format: sc.w rd, rs2, (rs1)
Description: SC writes a word in rs2 to the address in rs1, provided a valid reservation still exists on that address. SC writes zero to rd on success or a nonzero code on failure.
Pseudocode: x[rd] = StoreConditional32(M[x[rs1]], x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
Atomic Memory Operations
AMOADD.W: Atomic Memory Operation: Add Word
Format: amoadd.w rd, rs2, (rs1)
Description: AMOADD.W atomically loads a data value from the address in rs1, places the value into register rd, then adds the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] + x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOAND.W: Atomic Memory Operation: And Word
Format: amoand.w rd, rs2, (rs1)
Description: AMOAND.W atomically loads a data value from the address in rs1, places the value into register rd, then performs an AND between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] & x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOOR.W: Atomic Memory Operation: Or Word
Format: amoor.w rd, rs2, (rs1)
Description: AMOOR.W atomically loads a data value from the address in rs1, places the value into register rd, then performs an OR between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] | x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOXOR.W: Atomic Memory Operation: Xor Word
Format: amoxor.w rd, rs2, (rs1)
Description: AMOXOR.W atomically loads a data value from the address in rs1, places the value into register rd, then performs a XOR between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] ^ x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOSWAP.W: Atomic Memory Operation: Swap Word
Format: amoswap.w rd, rs2, (rs1)
Description: AMOSWAP.W atomically loads a data value from the address in rs1, places the value into register rd, then performs a SWAP between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] SWAP x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOMIN.W: Atomic Memory Operation: Minimum Word
Format: amomin.d rd, rs2, (rs1)
Description: AMOMIN.W atomically loads a data value from the address in rs1, places the value into register rd, then choses the minimum between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] MIN x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOMINU.W: Atomic Memory Operation: Minimum Word, Unsigned
Format: amominu.d rd, rs2, (rs1)
Description: AMOMINU.W atomically loads a data value from the address in rs1, places the value into register rd, then choses the minimum (the values treated as unsigned) between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] MINU x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOMAX.W: Atomic Memory Operation: Maximum Word, Unsigned
Format: amomax.d rd, rs2, (rs1)
Description: AMOMAX.W atomically loads a data value from the address in rs1, places the value into register rd, then choses the maximum between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] MAX x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
AMOMAXU.W: Atomic Memory Operation: Maximum Word, Unsigned
Format: amomaxu.d rd, rs2, (rs1)
Description: AMOMAXU.W atomically loads a data value from the address in rs1, places the value into register rd, then choses the maximum (the values treated as unsigned) between the loaded value and the original value in rs2, then stores the result back to the address in rs1.
Pseudocode: x[rd] = AMO32(M[x[rs1]] MAXU x[rs2])
Invalid values: NONE
Exception raised: If the address is not naturally aligned (4-byte boundary), a misaligned address exception will be generated.
RV32C Compressed Instructions
RVC uses a simple compression scheme that offers shorter 16-bit versions of common 32-bit RISC-V instructions when:
the immediate or address offset is small;
one of the registers is the zero register (x0), the ABI link register (x1), or the ABI stack pointer (x2);
the destination register and the first source register are identical;
the registers used are the 8 most popular ones.
The C extension is compatible with all other standard instruction extensions. The C extension allows 16-bit instructions to be freely intermixed with 32-bit instructions, with the latter now able to start on any 16-bit boundary. With the addition of the C extension, JAL and JALR instructions will no longer raise an instruction misaligned exception.
Integer Computational Instructions
C.LI: Compressed Load Immediate
Format: c.li rd, imm[5:0]
Description: loads the sign-extended 6-bit immediate, imm, into register rd.
Pseudocode: x[rd] = sext(imm[5:0])
Invalid values: rd = x0
Exception raised: NONE
C.LUI: Compressed Load Upper Immediate
Format: c.lui rd, nzimm[17:12]
Description: loads the non-zero 6-bit immediate field into bits 17–12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination.
Pseudocode: x[rd] = sext(nzimm[17:12] << 12)
Invalid values: rd = x0 & rd = x2 & nzimm = 0
Exception raised: NONE
C.ADDI: Compressed Addition Immediate
Format: c.addi rd, nzimm[5:0]
Description: adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd.
Pseudocode: x[rd] = x[rd] + sext(nzimm[5:0])
Invalid values: rd = x0 & nzimm = 0
Exception raised: NONE
C.ADDI16SP: Addition Immediate Scaled by 16, to Stack Pointer
Format: c.addi16sp nzimm[9:4]
Description: adds the non-zero sign-extended 6-bit immediate to the value in the stack pointer (sp=x2), where the immediate is scaled to represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure prologues and epilogues. C.ADDI16SP shares the opcode with C.LUI, but has a destination field of x2.
Pseudocode: x[2] = x[2] + sext(nzimm[9:4])
Invalid values: rd != x2 & nzimm = 0
Exception raised: NONE
C.ADDI4SPN: Addition Immediate Scaled by 4, to Stack Pointer
Format: c.addi4spn nzimm[9:2]
Description: adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd’. This instruction is used to generate pointers to stack-allocated variables.
Pseudocode: x[8 + rd’] = x[2] + zext(nzimm[9:2])
Invalid values: nzimm = 0
Exception raised: NONE
C.SLLI: Compressed Shift Left Logic Immediate
Format: c.slli rd, uimm[5:0]
Description: performs a logical left shift (zeros are shifted into the lower bits).
Pseudocode: x[rd] = x[rd] << uimm[5:0]
Invalid values: rd = x0 & uimm[5] = 0
Exception raised: NONE
C.SRLI: Compressed Shift Right Logic Immediate
Format: c.srli rd’, uimm[5:0]
Description: performs a logical right shift (zeros are shifted into the upper bits).
Pseudocode: x[8 + rd’] = x[8 + rd’] >> uimm[5:0]
Invalid values: uimm[5] = 0
Exception raised: NONE
C.SRAI: Compressed Shift Right Arithmetic Immediate
Format: c.srai rd’, uimm[5:0]
Description: performs an arithmetic right shift (sign bits are shifted into the upper bits).
Pseudocode: x[8 + rd’] = x[8 + rd’] >>s uimm[5:0]
Invalid values: uimm[5] = 0
Exception raised: NONE
C.ANDI: Compressed AND Immediate
Format: c.andi rd’, imm[5:0]
Description: computes the bitwise AND of the value in register rd’, and the sign-extended 6-bit immediate, then writes the result to rd’.
Pseudocode: x[8 + rd’] = x[8 + rd’] & sext(imm[5:0])
Invalid values: NONE
Exception raised: NONE
C.ADD: Compressed Addition
Format: c.add rd, rs2
Description: adds the values in registers rd and rs2 and writes the result to register rd.
Pseudocode: x[rd] = x[rd] + x[rs2]
Invalid values: rd = x0 & rs2 = x0
Exception raised: NONE
C.MV: Move
Format: c.mv rd, rs2
Description: copies the value in register rs2 into register rd.
Pseudocode: x[rd] = x[rs2]
Invalid values: rd = x0 & rs2 = x0
Exception raised: NONE
C.AND: Compressed AND
Format: c.and rd’, rs2’
Description: computes the bitwise AND of of the value in register rd’, and register rs2’, then writes the result to rd’.
Pseudocode: x[8 + rd’] = x[8 + rd’] & x[8 + rs2’]
Invalid values: NONE
Exception raised: NONE
C.OR: Compressed OR
Format: c.or rd’, rs2’
Description: computes the bitwise OR of of the value in register rd’, and register rs2’, then writes the result to rd’.
Pseudocode: x[8 + rd’] = x[8 + rd’] | x[8 + rs2’]
Invalid values: NONE
Exception raised: NONE
C.XOR: Compressed XOR
Format: c.and rd’, rs2’
Description: computes the bitwise XOR of of the value in register rd’, and register rs2’, then writes the result to rd’.
Pseudocode: x[8 + rd’] = x[8 + rd’] ^ x[8 + rs2’]
Invalid values: NONE
Exception raised: NONE
C.SUB: Compressed Subtraction
Format: c.sub rd’, rs2’
Description: subtracts the value in registers rs2’ from value in rd’ and writes the result to register rd’.
Pseudocode: x[8 + rd’] = x[8 + rd’] - x[8 + rs2’]
Invalid values: NONE
Exception raised: NONE
C.EBREAK: Compressed Ebreak
Format: c.ebreak
Description: cause control to be transferred back to the debugging environment.
Pseudocode: RaiseException(Breakpoint)
Invalid values: NONE
Exception raised: Raise a Breakpoint exception.
Control Transfer Instructions
C.J: Compressed Jump
Format: c.j imm[11:1]
Description: performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address.
Pseudocode: pc += sext(imm[11:1])
Invalid values: NONE
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception.
C.JAL: Compressed Jump and Link
Format: c.jal imm[11:1]
Description: performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1.
Pseudocode: x[1] = pc+2; pc += sext(imm[11:1])
Invalid values: NONE
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception.
C.JR: Compressed Jump Register
Format: c.jr rs1
Description: performs an unconditional control transfer to the address in register rs1.
Pseudocode: pc = x[rs1]
Invalid values: rs1 = x0
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception.
C.JALR: Compressed Jump and Link Register
Format: c.jalr rs1
Description: performs the same operation as C.JR, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1.
Pseudocode: t = pc+2; pc = x[rs1]; x[1] = t
Invalid values: rs1 = x0
Exception raised: jumps to an incorrect instruction address will usually quickly raise an exception.
C.BEQZ: Branch if Equal Zero
Format: c.beqz rs1’, imm[8:1]
Description: performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1’ is zero.
Pseudocode: if (x[8+rs1’] == 0) pc += sext(imm[8:1])
Invalid values: NONE
Exception raised: No instruction fetch misaligned exception is generated for a conditional branch that is not taken.
C.BNEZ: Branch if Not Equal Zero
Format: c.bnez rs1’, imm[8:1]
Description: performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1’ isn’t zero.
Pseudocode: if (x[8+rs1’] != 0) pc += sext(imm[8:1])
Invalid values: NONE
Exception raised: No instruction fetch misaligned exception is generated for a conditional branch that is not taken.
Load and Store Instructions
C.LWSP: Load Word Stack-Pointer
Format: c.lwsp rd, uimm(x2)
Description: loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2.
Pseudocode: x[rd] = M[x[2] + zext(uimm[7:2])][31:0]
Invalid values: rd = x0
Exception raised: loads with a destination of x0 must still raise any exceptions, also an exception if the memory address isn’t aligned (4-byte boundary).
C.SWSP: Store Word Stack-Pointer
Format: c.lwsp rd, uimm(x2)
Description: stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2.
Pseudocode: M[x[2] + zext(uimm[7:2])][31:0] = x[rs2]
Invalid values: NONE
Exception raised: An exception raised if the memory address isn’t aligned (4-byte boundary).
C.LW: Compressed Load Word
Format: c.lw rd’, uimm(rs1’)
Description: loads a 32-bit value from memory into register rd’. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1’.
Pseudocode: x[8+rd’] = M[x[8+rs1’] + zext(uimm[6:2])][31:0])
Invalid values: NONE
Exception raised: An exception raised if the memory address isn’t aligned (4-byte boundary).
C.SW: Compressed Store Word
Format: c.sw rs2’, uimm(rs1’)
Description: stores a 32-bit value from memory into register rd’. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1’.
Pseudocode: M[x[8+rs1’] + zext(uimm[6:2])][31:0] = x[8+rs2’]
Invalid values: NONE
Exception raised: An exception raised if the memory address isn’t aligned (4-byte boundary).
RV32Zicsr Control and Status Register Instructions
All CSR instructions atomically read-modify-write a single CSR, whose CSR specifier is encoded in the 12-bit csr field of the instruction held in bits 31–20. The immediate forms use a 5-bit zero-extended immediate encoded in the rs1 field.
CSRRW: Control and Status Register Read and Write
Format: csrrw rd, csr, rs1
Description: reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd, the initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read.
Pseudocode: t = CSRs[csr]; CSRs[csr] = x[rs1]; x[rd] = t
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
CSRRS: Control and Status Register Read and Set
Format: csrrs rd, csr, rs1
Description: reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd, the initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written), if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.
Pseudocode: t = CSRs[csr]; CSRs[csr] = t | x[rs1]; x[rd] = t
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
CSRRC: Control and Status Register Read and Clear
Format: csrrc rd, csr, rs1
Description: reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd, the initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written), if rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.
Pseudocode: t = CSRs[csr]; CSRs[csr] = t & ∼x[rs1]; x[rd] = t
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
CSRRWI: Control and Status Register Read and Write Immediate
Format: csrrwi rd, csr, uimm[4:0]
Description: reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd. The zero-extends immediate is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read.
Pseudocode: x[rd] = CSRs[csr]; CSRs[csr] = zext(uimm[4:0])
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
CSRRSI: Control and Status Register Read and Set Immediate
Format: csrrsi rd, csr, uimm[4:0]
Description: reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written), if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.
Pseudocode: t = CSRs[csr]; CSRs[csr] = t | zext(uimm[4:0]); x[rd] = t
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
CSRRCI: Control and Status Register Read and Clear Immediate
Format: csrrci rd, csr, uimm[4:0]
Description: reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written), if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.
Pseudocode: t = CSRs[csr]; CSRs[csr] = t & ∼zext(uimm[4:0]); x[rd] = t
Invalid values: NONE
Exception raised: Attempts to access a non-existent CSR raise an illegal instruction exception.
RV32Zifencei Instruction-Fetch Fence
FENCE.I: Fence Instruction
Format: fence.i
Description: The FENCE.I instruction is used to synchronize the instruction and data streams. RISC-V does not guarantee that stores to instruction memory will be made visible to instruction fetches on the same RISC-V hart until a FENCE.I instruction is executed. A FENCE.I instruction only ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart.
Pseudocode: Fence(Store, Fetch)
Invalid values: NONE
Exception raised: NONE
CV32A6_CSR programmers view
Tip
This section was auto-generated by Register Manager from Jade Design Automation.
Register Summary
Name |
Address Offset |
Width |
Access Type |
Reset Value |
Display Name |
---|---|---|---|---|---|
|
0x1 |
32 |
RW |
0x00000000 |
Floating-Point Accrued Exceptions |
|
0x2 |
32 |
RW |
0x00000000 |
Floating-Point Dynamic Rounding Mode |
|
0x3 |
32 |
RW |
0x00000000 |
Floating-Point Control and Status Register |
|
0x100 |
32 |
RW |
0x00000000 |
Supervisor Status |
|
0x104 |
32 |
RW |
0x00000000 |
Supervisor Interrupt Enable |
|
0x105 |
32 |
RW |
0x00000000 |
Supervisor Trap Vector Base Address |
|
0x106 |
32 |
RW |
0x00000000 |
Supervisor Counter Enable |
|
0x140 |
32 |
RW |
0x00000000 |
Supervisor Scratch |
|
0x141 |
32 |
RW |
0x00000000 |
Supervisor Exception Program Counter |
|
0x142 |
32 |
RW |
0x00000000 |
Supervisor Cause |
|
0x143 |
32 |
RW |
0x00000000 |
Supervisor Trap Value |
|
0x144 |
32 |
RW |
0x00000000 |
Supervisor Interrupt Pending |
|
0x180 |
32 |
RW |
0x00000000 |
Supervisor Address Translation and Protection |
|
0x300 |
32 |
RW |
0x00000000 |
Machine Status |
|
0x301 |
32 |
RW |
0x00000000 |
Machine ISA |
|
0x302 |
32 |
RW |
0x00000000 |
Machine Exception Delegation |
|
0x303 |
32 |
RW |
0x00000000 |
Machine Interrupt Delegation |
|
0x304 |
32 |
RW |
0x00000000 |
Machine Interrupt Enable |
|
0x305 |
32 |
RW |
0x00000000 |
Machine Trap Vector |
|
0x306 |
32 |
RW |
0x00000000 |
Machine Counter Enable |
|
0x323 [+ i*0x1] |
32 |
RW |
0x00000000 |
Hardware Performance-Monitoring Event Selector |
|
0x340 |
32 |
RW |
0x00000000 |
Machine Scratch |
|
0x341 |
32 |
RW |
0x00000000 |
Machine Exception Program Counter |
|
0x342 |
32 |
RW |
0x00000000 |
Machine Cause |
|
0x343 |
32 |
RW |
0x00000000 |
Machine Trap Value |
|
0x344 |
32 |
RW |
0x00000000 |
Machine Interrupt Pending |
|
0x3A0 |
32 |
RW |
0x00000000 |
Physical Memory Protection Config 0 |
|
0x3A1 |
32 |
RW |
0x00000000 |
Physical Memory Protection Config 1 |
|
0x3A2 |
32 |
RW |
0x00000000 |
Physical Memory Protection Config 2 |
|
0x3A3 |
32 |
RW |
0x00000000 |
Physical Memory Protection Config 3 |
|
0x3B0 [+ i*0x1] |
32 |
RW |
0x00000000 |
Physical Memory Protection Address |
|
0x700 |
32 |
RW |
0x00000001 |
Instuction Cache |
|
0x701 |
32 |
RW |
0x00000001 |
Data Cache |
|
0x7A0 |
32 |
RW |
0x00000000 |
Trigger Select |
|
0x7A1 |
32 |
RW |
0x00000000 |
Trigger Data 1 |
|
0x7A2 |
32 |
RW |
0x00000000 |
Trigger Data 2 |
|
0x7A3 |
32 |
RW |
0x00000000 |
Trigger Data 3 |
|
0x7A4 |
32 |
RO |
0x00000000 |
Trigger Info |
|
0x7B0 |
32 |
RW |
0x00000000 |
Debug Control and Status |
|
0x7B1 |
32 |
RW |
0x00000000 |
Debug PC |
|
0x7B2 [+ i*0x1] |
32 |
RW |
0x00000000 |
Debug Scratch Register |
|
0x800 |
32 |
RW |
0x00000000 |
|
|
0xB00 |
32 |
RW |
0x00000000 |
M-mode Cycle counter |
|
0xB02 |
32 |
RW |
0x00000000 |
Machine Instruction Retired counter |
|
0xB03 |
32 |
RW |
0x00000000 |
L1 Inst Cache Miss |
|
0xB04 |
32 |
RW |
0x00000000 |
L1 Data Cache Miss |
|
0xB05 |
32 |
RW |
0x00000000 |
ITLB Miss |
|
0xB06 |
32 |
RW |
0x00000000 |
DTLB Miss |
|
0xB07 |
32 |
RW |
0x00000000 |
Loads |
|
0xB08 |
32 |
RW |
0x00000000 |
Stores |
|
0xB09 |
32 |
RW |
0x00000000 |
Taken Exceptions |
|
0xB0A |
32 |
RW |
0x00000000 |
Exception Return |
|
0xB0B |
32 |
RW |
0x00000000 |
Software Change of PC |
|
0xB0C |
32 |
RW |
0x00000000 |
Procedure Call |
|
0xB0D |
32 |
RW |
0x00000000 |
Procedure Return |
|
0xB0E |
32 |
RW |
0x00000000 |
Branch mis-predicted |
|
0xB0F |
32 |
RW |
0x00000000 |
Scoreboard Full |
|
0xB10 |
32 |
RW |
0x00000000 |
Instruction Fetch Queue Empty |
|
0xB80 |
32 |
RW |
0x00000000 |
Upper 32-bits of M-mode Cycle counter |
|
0xB82 |
32 |
RW |
0x00000000 |
Upper 32-bits of Machine Instruction Retired counter |
|
0xB83 [+ i*0x1] |
32 |
RW |
0x00000000 |
Upper 32-bits of Machine Hardware Performance Monitoring Counter |
|
0xC00 |
32 |
RO |
0x00000000 |
Cycle counter |
|
0xC01 |
32 |
RO |
0x00000000 |
Timer |
|
0xC02 |
32 |
RO |
0x00000000 |
Instruction Retired counter |
|
0xC03 |
32 |
RO |
0x00000000 |
L1 Inst Cache Miss |
|
0xC04 |
32 |
RO |
0x00000000 |
L1 Data Cache Miss |
|
0xC05 |
32 |
RO |
0x00000000 |
ITLB Miss |
|
0xC06 |
32 |
RO |
0x00000000 |
DTLB Miss |
|
0xC07 |
32 |
RO |
0x00000000 |
Loads |
|
0xC08 |
32 |
RO |
0x00000000 |
Stores |
|
0xC09 |
32 |
RO |
0x00000000 |
Taken Exceptions |
|
0xC0A |
32 |
RO |
0x00000000 |
Exception Return |
|
0xC0B |
32 |
RO |
0x00000000 |
Software Change of PC |
|
0xC0C |
32 |
RO |
0x00000000 |
Procedure Call |
|
0xC0D |
32 |
RO |
0x00000000 |
Procedure Return |
|
0xC0E |
32 |
RO |
0x00000000 |
Branch mis-predicted |
|
0xC0F |
32 |
RO |
0x00000000 |
Scoreboard Full |
|
0xC10 |
32 |
RO |
0x00000000 |
Instruction Fetch Queue Empty |
|
0xC80 |
32 |
RO |
0x00000000 |
Upper 32-bits of Cycle counter |
|
0xC81 |
32 |
RO |
0x00000000 |
Upper 32-bit of Timer |
|
0xC82 |
32 |
RO |
0x00000000 |
Upper 32-bits of Instruction Retired counter |
|
0xF11 |
32 |
RO |
0x00000000 |
Machine Vendor ID |
|
0xF12 |
32 |
RO |
0x00000003 |
Machine Architecture ID |
|
0xF13 |
32 |
RO |
0x00000000 |
Machine Implementation ID |
|
0xF14 |
32 |
RO |
0x00000000 |
Machine Hardware Thread ID |
Register Descriptions
Floating-Point Accrued Exceptions (fflags
)
- Address Offset
0x1
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The fields within the
fcsr
can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode fieldfrm
and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 intofrm
. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags fieldfflags
.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:5] |
|
Reserved |
RO |
0b0 |
[4] |
|
Invalid Operation |
RW |
0b0 |
[3] |
|
Divide by Zero |
RW |
0b0 |
[2] |
|
Overflow |
RW |
0b0 |
[1] |
|
Underflow |
RW |
0b0 |
[0] |
|
Inexact |
RW |
0b0 |
- Invalid Operation (
NV
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Divide by Zero (
DZ
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Overflow (
OF
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Underflow (
UF
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Inexact (
NX
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
Floating-Point Dynamic Rounding Mode (frm
)
- Address Offset
0x2
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The fields within the
fcsr
can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode fieldfrm
and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 intofrm
. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags fieldfflags
.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:3] |
|
Reserved |
RO |
0b0 |
[2:0] |
|
Floating-Point Rounding Mode |
RW |
0b0 |
- Floating-Point Rounding Mode (
FRM
) Floating-point operations use either a static rounding mode encoded in the instruction, or a dynamic rounding mode held in
frm
. Rounding modes are encoded as shown in the enumerated value. A value of 111 in the instruction’s rm field selects the dynamic rounding mode held infrm
. Iffrm
is set to an invalid value (101–111), any subsequent attempt to execute a floating-point operation with a dynamic rounding mode will raise an illegal instruction exception. Some instructions, including widening conversions, have the rm field but are nevertheless unaffected by the rounding mode; software should set their rm field to RNE (000).The following table shows the bitfield encoding Value
Name
Description
0b000
RNE
Round to Nearest, ties to Even
0b001
RTZ
Round towards Zero
0b010
RDN
Round Down
0b011
RUP
Round Up
0b100
RMM
Round to Nearest, ties to Max Magnitude
0b101 - 0b110
INVALID
Reserved for future use.
0b111
DYN
- In instruction’s rm field, selects dynamic rounding mode;
In Rounding Mode register, Invalid.
Floating-Point Control and Status Register (fcsr
)
- Address Offset
0x3
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The floating-point control and status register,
fcsr
, is a RISC-V control and status register (CSR). It is a read/write register that selects the dynamic rounding mode for floating-point arithmetic operations and holds the accrued exception flags.The
fcsr
register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the underlying CSR access instructions. FRCSR readsfcsr
by copying it into integer register rd. FSCSR swaps the value infcsr
by copying the original value into integer register rd, and then writing a new value obtained from integer register rs1 intofcsr
.The fields within the
fcsr
can also be accessed individually through different CSR addresses, and separate assembler pseudoinstructions are defined for these accesses. The FRRM instruction reads the Rounding Mode fieldfrm
and copies it into the least-significant three bits of integer register rd, with zero in all other bits. FSRM swaps the value in frm by copying the original value into integer register rd, and then writing a new value obtained from the three least-significant bits of integer register rs1 intofrm
. FRFLAGS and FSFLAGS are defined analogously for the Accrued Exception Flags fieldfflags
.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:8] |
|
Reserved |
RO |
0b0 |
[7:5] |
|
Floating-Point Rounding Mode |
RW |
0b0 |
[4] |
|
Invalid Operation |
RW |
0b0 |
[3] |
|
Divide by Zero |
RW |
0b0 |
[2] |
|
Overflow |
RW |
0b0 |
[1] |
|
Underflow |
RW |
0b0 |
[0] |
|
Inexact |
RW |
0b0 |
- Floating-Point Rounding Mode (
FRM
) Floating-point operations use either a static rounding mode encoded in the instruction, or a dynamic rounding mode held in
frm
. Rounding modes are encoded as shown in the enumerated value. A value of 111 in the instruction’s rm field selects the dynamic rounding mode held infrm
. Iffrm
is set to an invalid value (101–111), any subsequent attempt to execute a floating-point operation with a dynamic rounding mode will raise an illegal instruction exception. Some instructions, including widening conversions, have the rm field but are nevertheless unaffected by the rounding mode; software should set their rm field to RNE (000).The following table shows the bitfield encoding Value
Name
Description
0b000
RNE
Round to Nearest, ties to Even
0b001
RTZ
Round towards Zero
0b010
RDN
Round Down
0b011
RUP
Round Up
0b100
RMM
Round to Nearest, ties to Max Magnitude
0b101 - 0b110
INVALID
Reserved for future use.
0b111
DYN
- In instruction’s rm field, selects dynamic rounding mode;
In Rounding Mode register, Invalid.
- Invalid Operation (
NV
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Divide by Zero (
DZ
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Overflow (
OF
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Underflow (
UF
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
- Inexact (
NX
) The accrued exception flags indicate the exception conditions that have arisen on any floating-point arithmetic instruction since the field was last reset by software. The base RISC-V ISA does not support generating a trap on the setting of a floating-point exception flag.
Supervisor Status (sstatus
)
- Address Offset
0x100
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
sstatus
register keeps track of the processor’s current operating state.The
sstatus
register is a subset of themstatus
register.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31] |
|
State Dirty |
RO |
0b0 |
[30:20] |
|
Reserved |
RO |
0b0 |
[19] |
|
Make eXecutable Readable |
RW |
0b0 |
[18] |
|
Supervisor User Memory |
RW |
0b0 |
[17] |
|
Reserved |
RO |
0b0 |
[16:15] |
|
Extension State |
RO |
0b0 |
[14:13] |
|
Floating-point unit State |
RW |
0b0 |
[12:9] |
|
Reserved |
RO |
0b0 |
[8] |
|
Supervisor mode Prior Privilege |
RW |
0b0 |
[7:6] |
|
Reserved |
RO |
0b0 |
[5] |
|
Supervisor mode Prior Interrupt Enable |
RW |
0b0 |
[4] |
|
RW |
0b0 |
|
[3:2] |
|
Reserved |
RO |
0b0 |
[1] |
|
Supervisor mode Interrupt Enable |
RW |
0b0 |
[0] |
|
RW |
0b0 |
- State Dirty (
SD
) The SD bit is a read-only bit that summarizes whether either the FS, VS, or XS fields signal the presence of some dirty state that will require saving extended user context to memory. If FS, XS, and VS are all read-only zero, then SD is also always zero.
- Make eXecutable Readable (
MXR
) The MXR bit modifies the privilege with which loads access virtual memory. When MXR=0, only loads from pages marked readable will succeed. When MXR=1, loads from pages marked either readable or executable (R=1 or X=1) will succeed. MXR has no effect when page-based virtual memory is not in effect.
- Supervisor User Memory (
SUM
) The SUM (permit Supervisor User Memory access) bit modifies the privilege with which S-mode loads and stores access virtual memory. When SUM=0, S-mode memory accesses to pages that are accessible by U-mode will fault. When SUM=1, these accesses are permitted. SUM has no effect when page-based virtual memory is not in effect. Note that, while SUM is ordinarily ignored when not executing in S-mode, it is in effect when MPRV=1 and MPP=S. SUM is read-only 0 if S-mode is not supported or if
satp
.MODE is read-only 0.- Extension State (
XS
) The XS field is used to reduce the cost of context save and restore by setting and tracking the current state of the user-mode extensions. The XS field encodes the status of the additional user-mode extensions and associated state.
This field can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.
The following table shows the bitfield encoding Value
Name
Description
0b00
Off
All off
0b01
Initial
None dirty or clean, some on
0b10
Clean
None dirty, some clean
0b11
Dirty
Some dirty
- Floating-point unit State (
FS
) The FS field is used to reduce the cost of context save and restore by setting and tracking the current state of the floating-point unit. The FS field encodes the status of the floating-point unit state, including the floating-point registers
f0–f31
and the CSRsfcsr
,frm
, andfflags
.This field can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.
The following table shows the bitfield encoding Value
Name
Description
0b00
Off
0b01
Initial
0b10
Clean
0b11
Dirty
- Supervisor mode Prior Privilege (
SPP
) SPP bit indicates the privilege level at which a hart was executing before entering supervisor mode. When a trap is taken, SPP is set to 0 if the trap originated from user mode, or 1 otherwise. When an SRET instruction is executed to return from the trap handler, the privilege level is set to user mode if the SPP bit is 0, or supervisor mode if the SPP bit is 1; SPP is then set to 0.
- Supervisor mode Prior Interrupt Enable (
SPIE
) The SPIE bit indicates whether supervisor interrupts were enabled prior to trapping into supervisor mode. When a trap is taken into supervisor mode, SPIE is set to SIE, and SIE is set to 0. When an SRET instruction is executed, SIE is set to SPIE, then SPIE is set to 1.
UPIE
When a URET instruction is executed, UIE is set to UPIE, and UPIE is set to 1.
- Supervisor mode Interrupt Enable (
SIE
) The SIE bit enables or disables all interrupts in supervisor mode. When SIE is clear, interrupts are not taken while in supervisor mode. When the hart is running in user-mode, the value in SIE is ignored, and supervisor-level interrupts are enabled. The supervisor can disable individual interrupt sources using the
sie
CSR.UIE
The UIE bit enables or disables user-mode interrupts.
Supervisor Interrupt Enable (sie
)
- Address Offset
0x104
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
sie
is the register containing supervisor interrupt enable bits.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:10] |
|
Reserved |
RO |
0b0 |
[9] |
|
Supervisor-level External Interrupt Enable |
RW |
0b0 |
[8] |
|
RW |
0b0 |
|
[7:6] |
|
Reserved |
RO |
0b0 |
[5] |
|
Supervisor-level Timer Interrupt Enable |
RW |
0b0 |
[4] |
|
RW |
0b0 |
|
[3:2] |
|
Reserved |
RO |
0b0 |
[1] |
|
Supervisor-level Software Interrupt Enable |
RW |
0b0 |
[0] |
|
RW |
0b0 |
- Supervisor-level External Interrupt Enable (
SEIE
) SEIE is the interrupt-enable bit for supervisor-level external interrupts.
UEIE
User-level external interrupts are disabled when the UEIE bit in the sie register is clear.
- Supervisor-level Timer Interrupt Enable (
STIE
) STIE is the interrupt-enable bit for supervisor-level timer interrupts.
UTIE
User-level timer interrupts are disabled when the UTIE bit in the sie register is clear.
- Supervisor-level Software Interrupt Enable (
SSIE
) SSIE is the interrupt-enable bit for supervisor-level software interrupts.
USIE
User-level software interrupts are disabled when the USIE bit in the sie register is clear
Supervisor Trap Vector Base Address (stvec
)
- Address Offset
0x105
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
stvec
register holds trap vector configuration, consisting of a vector base address (BASE) and a vector mode (MODE).
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:2] |
|
RW |
0b0 |
|
[1:0] |
|
RW |
0b0 |
BASE
The BASE field in stvec is a WARL field that can hold any valid virtual or physical address, subject to the following alignment constraints: the address must be 4-byte aligned, and MODE settings other than Direct might impose additional alignment constraints on the value in the BASE field.
MODE
When MODE=Direct, all traps into supervisor mode cause the
pc
to be set to the address in the BASE field. When MODE=Vectored, all synchronous exceptions into supervisor mode cause thepc
to be set to the address in the BASE field, whereas interrupts cause thepc
to be set to the address in the BASE field plus four times the interrupt cause number.The following table shows the bitfield encoding Value
Name
Description
0b00
Direct
All exceptions set
pc
to BASE.0b01
Vectored
Asynchronous interrupts set pc to BASE+4×cause.
0b10 - 0b11
Reserved
Reserved
Supervisor Counter Enable (scounteren
)
- Address Offset
0x106
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The counter-enable register
scounteren
controls the availability of the hardware performance monitoring counters to U-mode.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:3] |
|
Hpmcountern |
RW |
0b0 |
[2] |
|
Instret |
RW |
0b0 |
[1] |
|
Time |
RW |
0b0 |
[0] |
|
Cycle |
RW |
0b0 |
- Hpmcountern (
HPMn
) When HPMn is clear, attempts to read the
hpmcountern
register while executing in U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted.- Instret (
IR
) When IR is clear, attempts to read the
instret
register while executing in U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted.- Time (
TM
) When TM is clear, attempts to read the
time
register while executing in U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted.- Cycle (
CY
) When CY is clear, attempts to read the
cycle
register while executing in U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted.
Supervisor Scratch (sscratch
)
- Address Offset
0x140
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
sscratch
register is dedicated for use by the supervisor.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Supervisor Scratch |
RW |
0b0 |
- Supervisor Scratch (
SSCRATCH
) Typically,
sscratch
is used to hold a pointer to the hart-local supervisor context while the hart is executing user code. At the beginning of a trap handler,sscratch
is swapped with a user register to provide an initial working register.
Supervisor Exception Program Counter (sepc
)
- Address Offset
0x141
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
When a trap is taken into S-mode,
sepc
is written with the virtual address of the instruction that was interrupted or that encountered the exception. Otherwise,sepc
is never written by the implementation, though it may be explicitly written by software.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Supervisor Exception Program Counter |
RW |
0b0 |
- Supervisor Exception Program Counter (
SEPC
) The low bit of SEPC (SEPC[0]) is always zero. On implementations that support only IALIGN=32, the two low bits (SEPC[1:0]) are always zero.
Supervisor Cause (scause
)
- Address Offset
0x142
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
When a trap is taken into S-mode,
scause
is written with a code indicating the event that caused the trap. Otherwise,scause
is never written by the implementation, though it may be explicitly written by software.Supervisor cause register (
scause
) values after trap are shown in the following table.Interrupt
Exception Code
Description
1
0
Reserved
1
1
Supervisor software interrupt
1
2-4
Reserved
1
5
Supervisor timer interrupt
1
6-8
Reserved
1
9
Supervisor external interrupt
1
10-15
Reserved
1
≥16
Designated for platform use
0
0
Instruction address misaligned
0
1
Instruction access fault
0
2
Illegal instruction
0
3
Breakpoint
0
4
Load address misaligned
0
5
Load access fault
0
6
Store/AMO address misaligned
0
7
Store/AMO access fault
0
8
Environment call from U-mode
0
9
Environment call from S-mode
0
10-11
Reserved
0
12
Instruction page fault
0
13
Load page fault
0
14
Reserved
0
15
Store/AMO page fault
0
16-23
Reserved
0
24-31
Designated for custom use
0
32-47
Reserved
0
48-63
Designated for custom use
0
≥64
Reserved
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31] |
|
RW |
0b0 |
|
[30:0] |
|
Exception Code |
RW |
0b0 |
Interrupt
The Interrupt bit in the
scause
register is set if the trap was caused by an interrupt.- Exception Code (
Exception_Code
) The Exception Code field contains a code identifying the last exception or interrupt.
Supervisor Trap Value (stval
)
- Address Offset
0x143
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
When a trap is taken into S-mode,
stval
is written with exception-specific information to assist software in handling the trap. Otherwise,stval
is never written by the implementation, though it may be explicitly written by software. The hardware platform will specify which exceptions must setstval
informatively and which may unconditionally set it to zero.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Supervisor Trap Value |
RW |
0b0 |
- Supervisor Trap Value (
STVAL
) If
stval
is written with a nonzero value when a breakpoint, address-misaligned, access-fault, or page-fault exception occurs on an instruction fetch, load, or store, thenstval
will contain the faulting virtual address.If
stval
is written with a nonzero value when a misaligned load or store causes an access-fault or page-fault exception, thenstval
will contain the virtual address of the portion of the access that caused the fault.If
stval
is written with a nonzero value when an instruction access-fault or page-fault exception occurs on a system with variable-length instructions, thenstval
will contain the virtual address of the portion of the instruction that caused the fault, whilesepc
will point to the beginning of the instruction.The
stval
register can optionally also be used to return the faulting instruction bits on an illegal instruction exception (sepc
points to the faulting instruction in memory). Ifstval
is written with a nonzero value when an illegal-instruction exception occurs, thenstval
will contain the shortest of:the actual faulting instruction
the first ILEN bits of the faulting instruction
the first SXLEN bits of the faulting instruction
The value loaded into
stval
on an illegal-instruction exception is right-justified and all unused upper bits are cleared to zero. For other traps,stval
is set to zero, but a future standard may redefinestval’s
setting for other traps.
Supervisor Interrupt Pending (sip
)
- Address Offset
0x144
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
sip
register contains information on pending interrupts.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:10] |
|
Reserved |
RO |
0b0 |
[9] |
|
Supervisor-level External Interrupt Pending |
RO |
0b0 |
[8] |
|
RW |
0b0 |
|
[7:6] |
|
Reserved |
RO |
0b0 |
[5] |
|
Supervisor-level Timer Interrupt Pending |
RO |
0b0 |
[4] |
|
RW |
0b0 |
|
[3:2] |
|
Reserved |
RO |
0b0 |
[1] |
|
Supervisor-level Software Interrupt Pending |
RO |
0b0 |
[0] |
|
RW |
0b0 |
- Supervisor-level External Interrupt Pending (
SEIP
) SEIP is the interrupt-pending bit for supervisor-level external interrupts.
UEIP
UEIP may be written by S-mode software to indicate to U-mode that an external interrupt is pending.
- Supervisor-level Timer Interrupt Pending (
STIP
) SEIP is the interrupt-pending bit for supervisor-level timer interrupts.
UTIP
A user-level timer interrupt is pending if the UTIP bit in the sip register is set
- Supervisor-level Software Interrupt Pending (
SSIP
) SSIP is the interrupt-pending bit for supervisor-level software interrupts.
USIP
A user-level software interrupt is triggered on the current hart by riting 1 to its user software interrupt-pending (USIP) bit
Supervisor Address Translation and Protection (satp
)
- Address Offset
0x180
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
satp
register controls supervisor-mode address translation and protection.The
satp
register is considered active when the effective privilege mode is S-mode or U-mode. Executions of the address-translation algorithm may only begin using a given value ofsatp
whensatp
is active.Note
Writing
satp
does not imply any ordering constraints between page-table updates and subsequent address translations, nor does it imply any invalidation of address-translation caches. If the new address space’s page tables have been modified, or if an ASID is reused, it may be necessary to execute an SFENCE.VMA instruction after, or in some cases before, writingsatp
.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31] |
|
Mode |
RW |
0b0 |
[30:22] |
|
Address Space Identifier |
RW |
0b0 |
[21:0] |
|
Physical Page Number |
RW |
0b0 |
- Mode (
MODE
) This bitfield selects the current address-translation scheme.
When MODE=Bare, supervisor virtual addresses are equal to supervisor physical addresses, and there is no additional memory protection beyond the physical memory protection scheme.
To select MODE=Bare, software must write zero to the remaining fields of
satp
(bits 30–0). Attempting to select MODE=Bare with a nonzero pattern in the remaining fields has anunspecified
effect on the value that the remaining fields assume and anunspecified
effect on address translation and protection behavior.The following table shows the bitfield encoding Value
Name
Description
0
Bare
No translation or protection.
1
Sv32
Page-based 32-bit virtual addressing.
- Address Space Identifier (
ASID
) This bitfield facilitates address-translation fences on a per-address-space basis.
- Physical Page Number (
PPN
) This bitfield holds the root page table, i.e., its supervisor physical address divided by 4 KiB.
Machine Status (mstatus
)
- Address Offset
0x300
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The
mstatus
register keeps track of and controls the hart’s current operating state.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31] |
|
State Dirty |
RO |
0b0 |
[30:23] |
|
Reserved |
RO |
0b0 |
[22] |
|
Trap SRET |
RW |
0b0 |
[21] |
|
Timeout Wait |
RW |
0b0 |
[20] |
|
Trap Virtual Memory |
RW |
0b0 |
[19] |
|
Make eXecutable Readable |
RW |
0b0 |
[18] |
|
Supervisor User Memory |
RW |
0b0 |
[17] |
|
Modify Privilege |
RW |
0b0 |
[16:15] |
|
Extension State |
RO |
0b0 |
[14:13] |
|
Floating-point unit State |
RW |
0b0 |
[12:11] |
|
Machine mode Prior Privilege |
RW |
0b0 |
[10:9] |
|
Reserved |
RO |
0b0 |
[8] |
|
Supervisor mode Prior Privilege |
RW |
0b0 |
[7] |
|
Machine mode Prior Interrupt Enable |
RW |
0b0 |
[6] |
|
Reserved |
RO |
0b0 |
[5] |
|
Supervisor mode Prior Interrupt Enable |
RW |
0b0 |
[4] |
|
RW |
0b0 |
|
[3] |
|
Machine mode Interrupt Enable |
RW |
0b0 |
[2] |
|
Reserved |
RO |
0b0 |
[1] |
|
Supervisor mode Interrupt Enable |
RW |
0b0 |
[0] |
|
RW |
0b0 |
- State Dirty (
SD
) The SD bit is a read-only bit that summarizes whether either the FS, VS, or XS fields signal the presence of some dirty state that will require saving extended user context to memory. If FS, XS, and VS are all read-only zero, then SD is also always zero.
- Trap SRET (
TSR
) The TSR bit supports intercepting the supervisor exception return instruction, SRET. When TSR=1, attempts to execute SRET while executing in S-mode will raise an illegal instruction exception. When TSR=0, this operation is permitted in S-mode.
- Timeout Wait (
TW
) The TW bit supports intercepting the WFI instruction. When TW=0, the WFI instruction may execute in lower privilege modes when not prevented for some other reason. When TW=1, then if WFI is executed in any less-privileged mode, and it does not complete within an implementation-specific, bounded time limit, the WFI instruction causes an illegal instruction exception. The time limit may always be 0, in which case WFI always causes an illegal instruction exception in less-privileged modes when TW=1.
- Trap Virtual Memory (
TVM
) The TVM bit supports intercepting supervisor virtual-memory management operations. When TVM=1, attempts to read or write the
satp
CSR or execute an SFENCE.VMA or SINVAL.VMA instruction while executing in S-mode will raise an illegal instruction exception. When TVM=0, these operations are permitted in S-mode.- Make eXecutable Readable (
MXR
) The MXR bit modifies the privilege with which loads access virtual memory. When MXR=0, only loads from pages marked readable will succeed. When MXR=1, loads from pages marked either readable or executable (R=1 or X=1) will succeed. MXR has no effect when page-based virtual memory is not in effect.
- Supervisor User Memory (
SUM
) The SUM (permit Supervisor User Memory access) bit modifies the privilege with which S-mode loads and stores access virtual memory. When SUM=0, S-mode memory accesses to pages that are accessible by U-mode will fault. When SUM=1, these accesses are permitted. SUM has no effect when page-based virtual memory is not in effect. Note that, while SUM is ordinarily ignored when not executing in S-mode, it is in effect when MPRV=1 and MPP=S.
- Modify Privilege (
MPRV
) The MPRV (Modify PRiVilege) bit modifies the effective privilege mode, i.e., the privilege level at which loads and stores execute. When MPRV=0, loads and stores behave as normal, using the translation and protection mechanisms of the current privilege mode. When MPRV=1, load and store memory addresses are translated and protected, and endianness is applied, as though the current privilege mode were set to MPP. Instruction address-translation and protection are unaffected by the setting of MPRV.
- Extension State (
XS
) The XS field is used to reduce the cost of context save and restore by setting and tracking the current state of the user-mode extensions. The XS field encodes the status of the additional user-mode extensions and associated state.
This field can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.
The following table shows the bitfield encoding Value
Name
Description
0b00
Off
All off
0b01
Initial
None dirty or clean, some on
0b10
Clean
None dirty, some clean
0b11
Dirty
Some dirty
- Floating-point unit State (
FS
) The FS field is used to reduce the cost of context save and restore by setting and tracking the current state of the floating-point unit. The FS field encodes the status of the floating-point unit state, including the floating-point registers
f0–f31
and the CSRsfcsr
,frm
, andfflags
.This field can be checked by a context switch routine to quickly determine whether a state save or restore is required. If a save or restore is required, additional instructions and CSRs are typically required to effect and optimize the process.
The following table shows the bitfield encoding Value
Name
Description
0b00
Off
0b01
Initial
0b10
Clean
0b11
Dirty
- Machine mode Prior Privilege (
MPP
) Holds the previous privilege mode for machine mode.
- Supervisor mode Prior Privilege (
SPP
) Holds the previous privilege mode for supervisor mode.
- Machine mode Prior Interrupt Enable (
MPIE
) Indicates whether machine interrupts were enabled prior to trapping into machine mode.
- Supervisor mode Prior Interrupt Enable (
SPIE
) Indicates whether supervisor interrupts were enabled prior to trapping into supervisor mode.
UPIE
indicates whether user-level interrupts were enabled prior to taking a user-level trap
- Machine mode Interrupt Enable (
MIE
) Global interrupt-enable bit for Machine mode.
- Supervisor mode Interrupt Enable (
SIE
) Global interrupt-enable bit for Supervisor mode.
UIE
Global interrupt-enable bits
Machine ISA (misa
)
- Address Offset
0x301
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
The misa CSR is reporting the ISA supported by the hart.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:30] |
|
Machine XLEN |
RW |
0b0 |
[29:26] |
|
Reserved |
RO |
0b0 |
[25:0] |
|
Extensions |
RW |
0b0 |
- Machine XLEN (
MXL
) The MXL field encodes the native base integer ISA width.
The following table shows the bitfield encoding Value
Name
Description
0b01
XLEN_32
0b10
XLEN_64
0b11
XLEN_128
- Extensions (
Extensions
) The Extensions field encodes the presence of the standard extensions, with a single bit per letter of the alphabet.
The following table shows the bitfield encoding Value
Name
Description
0b00000000000000000000000001
A
Atomic extension.
0b00000000000000000000000010
B
Tentatively reserved for Bit-Manipulation extension.
0b00000000000000000000000100
C
Compressed extension.
0b00000000000000000000001000
D
Double-precision floating-point extension.
0b00000000000000000000010000
E
RV32E base ISA.
0b00000000000000000000100000
F
Single-precision floating-point extension.
0b00000000000000000001000000
G
Reserved.
0b00000000000000000010000000
H
Hypervisor extension.
0b00000000000000000100000000
I
RV32I/64I/128I base ISA.
0b00000000000000001000000000
J
Tentatively reserved for Dynamically Translated Languages extension.
0b00000000000000010000000000
K
Reserved.
0b00000000000000100000000000
L
Reserved.
0b00000000000001000000000000
M
Integer Multiply/Divide extension.
0b00000000000010000000000000
N
Tentatively reserved for User-Level Interrupts extension.
0b00000000000100000000000000
O
Reserved.
0b00000000001000000000000000
P
Tentatively reserved for Packed-SIMD extension.
0b00000000010000000000000000
Q
Quad-precision floating-point extension.
0b00000000100000000000000000
R
Reserved.
0b00000001000000000000000000
S
Supervisor mode implemented.
0b00000010000000000000000000
T
Reserved.
0b00000100000000000000000000
U
User mode implemented.
0b00001000000000000000000000
V
Tentatively reserved for Vector extension.
0b00010000000000000000000000
W
Reserved.
0b00100000000000000000000000
X
Non-standard extensions present.
0b01000000000000000000000000
Y
Reserved.
0b10000000000000000000000000
Z
Reserved.
Machine Exception Delegation (medeleg
)
- Address Offset
0x302
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Provides individual read/write bits to indicate that certain exceptions should be processed directly by a lower privilege level.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Synchronous Exceptions |
RW |
0b0 |
- Synchronous Exceptions (
Synchronous_Exceptions
) There is a bit position allocated for every synchronous exception, with the index of the bit position equal to the value returned in the
mcause
register.
Machine Interrupt Delegation (mideleg
)
- Address Offset
0x303
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Provides individual read/write bits to indicate that certain interrupts should be processed directly by a lower privilege level.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Interrupts |
RW |
0b0 |
- Interrupts (
Interrupts
) This bitfield holds trap delegation bits for individual interrupts, with the layout of bits matching those in the
mip
register.
Machine Interrupt Enable (mie
)
- Address Offset
0x304
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register contains machine interrupt enable bits.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:12] |
|
Reserved |
RO |
0b0 |
[11] |
|
M-mode External Interrupt Enable |
RW |
0b0 |
[10] |
|
Reserved |
RO |
0b0 |
[9] |
|
S-mode External Interrupt Enable |
RW |
0b0 |
[8] |
|
RW |
0b0 |
|
[7] |
|
M-mode Timer Interrupt Enable |
RW |
0b0 |
[6] |
|
Reserved |
RO |
0b0 |
[5] |
|
S-mode Timer Interrupt Enable |
RW |
0b0 |
[4] |
|
RW |
0b0 |
|
[3] |
|
M-mode Software Interrupt Enable |
RW |
0b0 |
[2] |
|
Reserved |
RO |
0b0 |
[1] |
|
S-mode Software Interrupt Enable |
RW |
0b0 |
[0] |
|
RW |
0b0 |
- M-mode External Interrupt Enable (
MEIE
) Enables machine mode external interrupts.
- S-mode External Interrupt Enable (
SEIE
) Enables supervisor mode external interrupts.
UEIE
enables U-mode external interrupts
- M-mode Timer Interrupt Enable (
MTIE
) Enables machine mode timer interrupts.
- S-mode Timer Interrupt Enable (
STIE
) Enables supervisor mode timer interrupts.
UTIE
timer interrupt-enable bit for U-mode
- M-mode Software Interrupt Enable (
MSIE
) Enables machine mode software interrupts.
- S-mode Software Interrupt Enable (
SSIE
) Enables supervisor mode software interrupts.
USIE
enable U-mode software interrrupts
Machine Trap Vector (mtvec
)
- Address Offset
0x305
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register holds trap vector configuration, consisting of a vector base address and a vector mode.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:2] |
|
RW |
0b0 |
|
[1:0] |
|
RW |
0b0 |
BASE
Holds the vector base address. The value in the BASE field must always be aligned on a 4-byte boundary.
MODE
Imposes additional alignment constraints on the value in the BASE field.
The following table shows the bitfield encoding Value
Name
Description
0b00
Direct
All exceptions set
pc
to BASE.0b01
Vectored
Asynchronous interrupts set
pc
to BASE+4×cause.0b10-0b11
Reserved
Reserved.
Machine Counter Enable (mcountern
)
- Address Offset
0x306
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register controls the availability of the hardware performance-monitoring counters to the next-lowest privileged mode.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:3] |
|
Hpmcountern |
RW |
0b0 |
[2] |
|
Instret |
RW |
0b0 |
[1] |
|
Time |
RW |
0b0 |
[0] |
|
Cycle |
RW |
0b0 |
- Hpmcountern (
HPMn
) When HPMn is clear, attempts to read the
hpmcountern
register while executing in S-mode or U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted in the next implemented privilege mode.- Instret (
IR
) When IR is clear, attempts to read the
instret
register while executing in S-mode or U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted in the next implemented privilege mode.- Time (
TM
) When TM is clear, attempts to read the
time
register while executing in S-mode or U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted in the next implemented privilege mode.- Cycle (
CY
) When CY is clear, attempts to read the
cycle
register while executing in S-mode or U-mode will cause an illegal instruction exception. When this bit is set, access to the corresponding register is permitted in the next implemented privilege mode.
Hardware Performance-Monitoring Event Selector (hpmevent[6]
)
- Address Offset
0x323 [+ i*0x1]
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register controls which event causes the corresponding counter to increment.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:5] |
|
Reserved |
RO |
0b0 |
[4:0] |
|
RW |
0b0 |
mhpmevent
event selector CSRs
Machine Scratch (mscratch
)
- Address Offset
0x340
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register is used to hold a pointer to a machine-mode hart-local context space and swapped with a user register upon entry to an M-mode trap handler.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Machine Scratch |
RW |
0b0 |
- Machine Scratch (
mscratch
) Holds a pointer to a machine-mode hart-local context space and swapped with a user register upon entry to an M-mode trap handler.
Machine Exception Program Counter (mepc
)
- Address Offset
0x341
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register must be able to hold all valid virtual addresses.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Machine Exception Program Counter |
RW |
0b0 |
- Machine Exception Program Counter (
mepc
) When a trap is taken into M-mode,
mepc
is written with the virtual address of the instruction that was interrupted or that encountered the exception.
Machine Cause (mcause
)
- Address Offset
0x342
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
When a trap is taken into M-mode, mcause is written with a code indicating the event that caused the trap.
Machine cause register (
mcause
) values after trap are shown in the following table.Interrupt
Exception Code
Description
1
0
Reserved
1
1
Supervisor software interrupt
1
2-4
Reserved
1
5
Supervisor timer interrupt
1
6-8
Reserved
1
9
Supervisor external interrupt
1
10-15
Reserved
1
≥16
Designated for platform use
0
0
Instruction address misaligned
0
1
Instruction access fault
0
2
Illegal instruction
0
3
Breakpoint
0
4
Load address misaligned
0
5
Load access fault
0
6
Store/AMO address misaligned
0
7
Store/AMO access fault
0
8
Environment call from U-mode
0
9
Environment call from S-mode
0
10-11
Reserved
0
12
Instruction page fault
0
13
Load page fault
0
14
Reserved
0
15
Store/AMO page fault
0
16-23
Reserved
0
24-31
Designated for custom use
0
32-47
Reserved
0
48-63
Designated for custom use
0
≥64
Reserved
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31] |
|
Interrupt |
RW |
0b0 |
[30:0] |
|
Exception Code |
RW |
0b0 |
- Interrupt (
Interrupt
) This bit is set if the trap was caused by an interrupt.
- Exception Code (
exception_code
) This field contains a code identifying the last exception or interrupt.
Machine Trap Value (mtval
)
- Address Offset
0x343
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
When a trap is taken into M-mode, mtval is either set to zero or written with exception-specific information to assist software in handling the trap.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Machine Trap Value |
RW |
0b0 |
- Machine Trap Value (
mtval
) If
mtval
is written with a nonzero value when a breakpoint, address-misaligned, access-fault, or page-fault exception occurs on an instruction fetch, load, or store, then mtval will contain the faulting virtual address.If
mtval
is written with a nonzero value when a misaligned load or store causes an access-fault or page-fault exception, thenmtval
will contain the virtual address of the portion of the access that caused the fault.If
mtval
is written with a nonzero value when an instruction access-fault or page-fault exception occurs on a system with variable-length instructions, thenmtval
will contain the virtual address of the portion of the instruction that caused the fault, whilemepc
will point to the beginning of the instruction.
Machine Interrupt Pending (mip
)
- Address Offset
0x344
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register contains machine interrupt pending bits.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:12] |
|
Reserved |
RO |
0b0 |
[11] |
|
M-mode External Interrupt Pending |
RO |
0b0 |
[10] |
|
Reserved |
RO |
0b0 |
[9] |
|
S-mode External Interrupt Pending |
RW |
0b0 |
[8] |
|
RW |
0b0 |
|
[7] |
|
M-mode Timer Interrupt Pending |
RO |
0b0 |
[6] |
|
Reserved |
RO |
0b0 |
[5] |
|
S-mode Timer Interrupt Pending |
RW |
0b0 |
[4] |
|
RW |
0b0 |
|
[3] |
|
M-mode Software Interrupt Pending |
RO |
0b0 |
[2] |
|
Reserved |
RO |
0b0 |
[1] |
|
S-mode Software Interrupt Pending |
RW |
0b0 |
[0] |
|
RW |
0b0 |
- M-mode External Interrupt Pending (
MEIP
) The interrupt-pending bit for machine-level external interrupts.
- S-mode External Interrupt Pending (
SEIP
) The interrupt-pending bit for supervisor-level external interrupts.
UEIP
enables external interrupts
- M-mode Timer Interrupt Pending (
MTIP
) The interrupt-pending bit for machine-level timer interrupts.
- S-mode Timer Interrupt Pending (
STIP
) The interrupt-pending bit for supervisor-level timer interrupts.
UTIP
Correspond to timer interrupt-pending bits for user interrupt
- M-mode Software Interrupt Pending (
MSIP
) The interrupt-pending bit for machine-level software interrupts.
- S-mode Software Interrupt Pending (
SSIP
) The interrupt-pending bit for supervisor-level software interrupts.
USIP
A hart to directly write its own USIP bits when running in the appropriate mode
Physical Memory Protection Config 0 (pmpcfg0
)
- Address Offset
0x3A0
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Holds configuration 0-3.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:24] |
|
Physical Memory Protection 3 Config |
RW |
0b0 |
[23:16] |
|
Physical Memory Protection 2 Config |
RW |
0b0 |
[15:8] |
|
Physical Memory Protection 1 Config |
RW |
0b0 |
[7:0] |
|
Physical Memory Protection 0 Config |
RW |
0b0 |
- Physical Memory Protection 3 Config (
pmp3cfg
) Holds the configuration.
- Physical Memory Protection 2 Config (
pmp2cfg
) Holds the configuration.
- Physical Memory Protection 1 Config (
pmp1cfg
) Holds the configuration.
- Physical Memory Protection 0 Config (
pmp0cfg
) Holds the configuration.
Physical Memory Protection Config 1 (pmpcfg1
)
- Address Offset
0x3A1
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Holds configuration 4-7.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:24] |
|
Physical Memory Protection 7 Config |
RW |
0b0 |
[23:16] |
|
Physical Memory Protection 6 Config |
RW |
0b0 |
[15:8] |
|
Physical Memory Protection 5 Config |
RW |
0b0 |
[7:0] |
|
Physical Memory Protection 4 Config |
RW |
0b0 |
- Physical Memory Protection 7 Config (
pmp7cfg
) Holds the configuration.
- Physical Memory Protection 6 Config (
pmp6cfg
) Holds the configuration.
- Physical Memory Protection 5 Config (
pmp5cfg
) Holds the configuration.
- Physical Memory Protection 4 Config (
pmp4cfg
) Holds the configuration.
Physical Memory Protection Config 2 (pmpcfg2
)
- Address Offset
0x3A2
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Holds configuration 8-11.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:24] |
|
Physical Memory Protection 11 Config |
RW |
0b0 |
[23:16] |
|
Physical Memory Protection 10 Config |
RW |
0b0 |
[15:8] |
|
Physical Memory Protection 9 Config |
RW |
0b0 |
[7:0] |
|
Physical Memory Protection 8 Config |
RW |
0b0 |
- Physical Memory Protection 11 Config (
pmp11cfg
) Holds the configuration.
- Physical Memory Protection 10 Config (
pmp10cfg
) Holds the configuration.
- Physical Memory Protection 9 Config (
pmp9cfg
) Holds the configuration.
- Physical Memory Protection 8 Config (
pmp8cfg
) Holds the configuration.
Physical Memory Protection Config 3 (pmpcfg3
)
- Address Offset
0x3A3
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Holds configuration 12-15.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:24] |
|
Physical Memory Protection 15 Config |
RW |
0b0 |
[23:16] |
|
Physical Memory Protection 14 Config |
RW |
0b0 |
[15:8] |
|
Physical Memory Protection 13 Config |
RW |
0b0 |
[7:0] |
|
Physical Memory Protection 12 Config |
RW |
0b0 |
- Physical Memory Protection 15 Config (
pmp15cfg
) Holds the configuration.
- Physical Memory Protection 14 Config (
pmp14cfg
) Holds the configuration.
- Physical Memory Protection 13 Config (
pmp13cfg
) Holds the configuration.
- Physical Memory Protection 12 Config (
pmp12cfg
) Holds the configuration.
Physical Memory Protection Address (pmpaddr[16]
)
- Address Offset
0x3B0 [+ i*0x1]
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Address register for Physical Memory Protection.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Address |
RW |
0b0 |
- Address (
address
) Encodes bits 33-2 of a 34-bit physical address.
Instuction Cache (icache
)
- Address Offset
0x700
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000001
- Description
Custom Register to enable/disable for Icache [bit 0]
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:1] |
|
Reserved |
RO |
0b0 |
[0] |
|
Instruction Cache |
RW |
0b1 |
- Instruction Cache (
icache
) Custom Register
Data Cache (dcache
)
- Address Offset
0x701
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000001
- Description
Custom Register to enable/disable for Dcache [bit 0]
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:1] |
|
Reserved |
RO |
0b0 |
[0] |
|
Data Cache |
RW |
0b1 |
- Data Cache (
dcache
) Custom Register
Trigger Select (tselect
)
- Address Offset
0x7A0
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
This register determines which trigger is accessible through the other trigger registers.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Index |
RW |
0b0 |
- Index (
index
) The set of accessible triggers must start at 0, and be contiguous.
Writes of values greater than or equal to the number of supported triggers may result in a different value in this register than what was written. To verify that what they wrote is a valid index, debuggers can read back the value and check that
tselect
holds what they wrote.Since triggers can be used both by Debug Mode and M-mode, the debugger must restore this register if it modifies it.
Trigger Data 1 (tdata1
)
- Address Offset
0x7A1
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Trigger-specific data.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:28] |
|
Type |
RW |
0b0 |
[27] |
|
Debug Mode |
RW |
0b0 |
[26:0] |
|
Data |
RW |
0b0 |
- Type (
type
) Type of trigger.
The following table shows the bitfield encoding Value
Name
Description
0b0000
no_trigger
There is no trigger at this
tselect
.0b0001
legacy_address_match_trigger
The trigger is a legacy SiFive address match trigger. These should not be implemented and aren’t further documented here.
0b0010
address_data_match_trigger
The trigger is an address/data match trigger. The remaining bits in this register act as described in
mcontrol
.0b0011
instruction_count_trigger
The trigger is an instruction count trigger. The remaining bits in this register act as described in
icount
.0b0100
interrupt_trigger
The trigger is an interrupt trigger. The remaining bits in this register act as described in
itrigger
.0b0101
exception_trigger
The trigger is an exception trigger. The remaining bits in this register act as described in
etrigger
.0b0110-0b1110
Reserved
Reserved.
0b1111
trigger_exists
This trigger exists (so enumeration shouldn’t terminate), but is not currently available.
- Debug Mode (
dmode
) This bit is only writable from Debug Mode.
The following table shows the bitfield encoding Value
Name
Description
0
D_and_M_mode
Both Debug and M-mode can write the
tdata
registers at the selectedtselect
.1
M_mode_only
Only Debug Mode can write the
tdata
registers at the selectedtselect
. Writes from other modes are ignored.- Data (
data
) Trigger-specific data.
Trigger Data 2 (tdata2
)
- Address Offset
0x7A2
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Trigger-specific data.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Data |
RW |
0b0 |
- Data (
data
) Trigger-specific data.
Trigger Data 3 (tdata3
)
- Address Offset
0x7A3
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Trigger-specific data.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Data |
RW |
0b0 |
- Data (
data
) Trigger-specific data.
Trigger Info (tinfo
)
- Address Offset
0x7A4
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Shows trigger information.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:16] |
|
Reserved |
RO |
0b0 |
[15:0] |
|
Info |
RO |
0b0 |
- Info (
info
) One bit for each possible
type
enumerated intdata1
. Bit N corresponds to type N. If the bit is set, then that type is supported by the currently selected trigger.If the currently selected trigger doesn’t exist, this field contains 1.
If
type
is not writable, this register may be unimplemented, in which case reading it causes an illegal instruction exception. In this case the debugger can read the only supported type fromtdata1
.
Debug Control and Status (dcsr
)
- Address Offset
0x7B0
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Debug ontrol and status register.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:28] |
|
Debug Version |
RO |
0b0 |
[27:16] |
|
Reserved |
RO |
0b0 |
[15] |
|
Environment Breakpoint M-mode |
RW |
0b0 |
[14] |
|
Reserved |
RO |
0b0 |
[13] |
|
Environment Breakpoint S-mode |
RW |
0b0 |
[12] |
|
Environment Breakpoint U-mode |
RW |
0b0 |
[11] |
|
Stepping Interrupt Enable |
RW |
0b0 |
[10] |
|
Stop Counters |
RW |
0b0 |
[9] |
|
Stop Timers |
RW |
0b0 |
[8:6] |
|
Cause |
RW |
0b0 |
[5] |
|
Reserved |
RO |
0b0 |
[4] |
|
Modify Privilege Enable |
RW |
0b0 |
[3] |
|
Non-Maskable Interrupt Pending |
RO |
0b0 |
[2] |
|
Step |
RW |
0b0 |
[1:0] |
|
Privilege level |
RW |
0b0 |
- Debug Version (
xdebugver
) Shows the version of the debug support.
The following table shows the bitfield encoding Value
Name
Description
0b0000
no_ext_debug
There is no external debug support.
0b0100
ext_debug_spec
External debug support exists as it is described in the riscv-debug-release document.
0b1111
ext_debug_no_spec
There is external debug support, but it does not conform to any available version of the riscv-debug-release spec.
- Environment Breakpoint M-mode (
ebreakm
) Shows the behvior of the
ebreak
instruction in machine mode.The following table shows the bitfield encoding Value
Name
Description
0
break_as_spec
ebreak
instructions in M-mode behave as described in the Privileged Spec.1
break_to_debug
ebreak
instructions in M-mode enter Debug Mode.- Environment Breakpoint S-mode (
ebreaks
) Shows the behvior of the
ebreak
instruction in supervisor mode.The following table shows the bitfield encoding Value
Name
Description
0
break_as_spec
ebreak
instructions in S-mode behave as described in the Privileged Spec.1
break_to_debug
ebreak
instructions in S-mode enter Debug Mode.- Environment Breakpoint U-mode (
ebreaku
) Shows the behvior of the
ebreak
instruction in user mode.The following table shows the bitfield encoding Value
Name
Description
0
break_as_spec
ebreak
instructions in U-mode behave as described in the Privileged Spec.1
break_to_debug
ebreak
instructions in U-mode enter Debug Mode.- Stepping Interrupt Enable (
stepie
) Enables/disables interrupts for single stepping.
The debugger must not change the value of this bit while the hart is running.
The following table shows the bitfield encoding Value
Name
Description
0
disabled
Interrupts are disabled during single stepping.
1
enabled
Interrupts are enabled during single stepping.
- Stop Counters (
stopcount
) Starts/stops incrementing counters in debug mode.
The following table shows the bitfield encoding Value
Name
Description
0
increment_counters
Increment counters as usual.
1
dont_increment_counters
Don’t increment any counters while in Debug Mode or on
ebreak
instructions that cause entry into Debug Mode.- Stop Timers (
stoptime
) Starts/stops incrementing timers in debug mode.
The following table shows the bitfield encoding Value
Name
Description
0
increment_timers
Increment timers as usual.
1
dont_increment_timers
Don’t increment any hart-local timers while in Debug Mode.
- Cause (
cause
) Explains why Debug Mode was entered.
When there are multiple reasons to enter Debug Mode in a single cycle, hardware sets
cause
to the cause with the highest priority.The following table shows the bitfield encoding Value
Name
Description
0b001
ebreak_instruction
An
ebreak
instruction was executed. (priority 3)0b010
trigger_module
The Trigger Module caused a breakpoint exception. (priority 4, highest)
0b011
debugger_request
The debugger requested entry to Debug Mode using
haltreq
. (priority 1)0b100
single_step
The hart single stepped because
step
was set. (priority 0, lowest)0b101
reset_halt
The hart halted directly out of reset due to
resethaltreq
. It is also acceptable to report 3 when this happens. (priority 2)- Modify Privilege Enable (
mprven
) Enables/disables the modify privilege setting in debug mode.
The following table shows the bitfield encoding Value
Name
Description
0
disable_mprv
MPRV in
mstatus
is ignored in Debug Mode.1
enable_mprv
MPRV in
mstatus
takes effect in Debug Mode.- Non-Maskable Interrupt Pending (
nmip
) When set, there is a Non-Maskable-Interrupt (NMI) pending for the hart.
- Step (
step
) When set and not in Debug Mode, the hart will only execute a single instruction and then enter Debug Mode. If the instruction does not complete due to an exception, the hart will immediately enter Debug Mode before executing the trap handler, with appropriate exception registers set. The debugger must not change the value of this bit while the hart is running.
- Privilege level (
prv
) Contains the privilege level the hart was operating in when Debug Mode was entered. A debugger can change this value to change the hart’s privilege level when exiting Debug Mode.
The following table shows the bitfield encoding Value
Name
Description
0b00
User
0b01
Supervisor
0b11
Machine
Debug PC (dpc
)
- Address Offset
0x7B1
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Upon entry to debug mode,
dpc
is updated with the virtual address of the next instruction to be executed.When resuming, the hart’s PC is updated to the virtual address stored in
dpc
. A debugger may writedpc
to change where the hart resumes.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
RW |
0b0 |
dpc
The
dpc
behavior is described in more detail in the table below.Cause
Virtual Address in DPC
ebreak
Address of the
ebreak
instruction.single step
Address of the instruction that would be executed next if no debugging was going on. Ie. pc + 4 for 32-bit instructions that don’t change program flow, the destination PC on taken jumps/branches, etc.
trigger module
If
timing
is 0, the address of the instruction which caused the trigger to fire. Iftiming
is 1, the address of the next instruction to be executed at the time that debug mode was entered.halt request
Address of the next instruction to be executed at the time that debug mode was entered.
Debug Scratch Register (dscratch[2]
)
- Address Offset
0x7B2 [+ i*0x1]
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Optional scratch register. A debugger must not write to this register unless
hartinfo
explicitly mentions it.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
RW |
0b0 |
ftran
- Address Offset
0x800
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Floating Point Custom CSR
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:7] |
|
Reserved |
RO |
0b0 |
[6:0] |
|
RW |
0b0 |
ftran
Floating Point Custom CSR
M-mode Cycle counter (mcycle
)
- Address Offset
0xB00
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Counts the number of clock cycles executed by the processor core on which the hart is running.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
- Count (
count
) Counts the number of clock cycles executed by the processor core.
Machine Instruction Retired counter (minstret
)
- Address Offset
0xB02
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Counts the number of instructions the hart has retired.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
- Count (
count
) Counts the number of instructions the hart has retired.
L1 Inst Cache Miss (ml1_icache_miss
)
- Address Offset
0xB03
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
L1 Data Cache Miss (ml1_dcache_miss
)
- Address Offset
0xB04
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
ITLB Miss (mitlb_miss
)
- Address Offset
0xB05
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
DTLB Miss (mdtlb_miss
)
- Address Offset
0xB06
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Loads (mload
)
- Address Offset
0xB07
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Stores (mstore
)
- Address Offset
0xB08
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Taken Exceptions (mexception
)
- Address Offset
0xB09
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Exception Return (mexception_ret
)
- Address Offset
0xB0A
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Software Change of PC (mbranch_jump
)
- Address Offset
0xB0B
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Procedure Call (mcall
)
- Address Offset
0xB0C
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Procedure Return (mret
)
- Address Offset
0xB0D
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Branch mis-predicted (mmis_predict
)
- Address Offset
0xB0E
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Scoreboard Full (msb_full
)
- Address Offset
0xB0F
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Instruction Fetch Queue Empty (mif_empty
)
- Address Offset
0xB10
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Upper 32-bits of M-mode Cycle counter (mcycleh
)
- Address Offset
0xB80
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Counts the number of clock cycles executed by the processor core on which the hart is running.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
- Count (
count
) Counts the number of clock cycles executed by the processor core.
Upper 32-bits of Machine Instruction Retired counter (minstreth
)
- Address Offset
0xB82
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Counts the number of instructions the hart has retired.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
- Count (
count
) Counts the number of instructions the hart has retired.
Upper 32-bits of Machine Hardware Performance Monitoring Counter (mhpmcounterh[6]
)
- Address Offset
0xB83 [+ i*0x1]
- Width (bits)
32
- Access Type
RW
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RW |
0b0 |
Cycle counter (cycle
)
- Address Offset
0xC00
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Cycle counter for RDCYCLE instruction.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Timer (time
)
- Address Offset
0xC01
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Timer for RDTIME instruction.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Instruction Retired counter (instret
)
- Address Offset
0xC02
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Instructions-retired counter for RDINSTRET instruction
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
L1 Inst Cache Miss (l1_icache_miss
)
- Address Offset
0xC03
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
L1 Data Cache Miss (l1_dcache_miss
)
- Address Offset
0xC04
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
ITLB Miss (itlb_miss
)
- Address Offset
0xC05
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
DTLB Miss (dtlb_miss
)
- Address Offset
0xC06
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Loads (load
)
- Address Offset
0xC07
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Stores (store
)
- Address Offset
0xC08
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Taken Exceptions (exception
)
- Address Offset
0xC09
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Exception Return (exception_ret
)
- Address Offset
0xC0A
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Software Change of PC (branch_jump
)
- Address Offset
0xC0B
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Procedure Call (call
)
- Address Offset
0xC0C
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Procedure Return (ret
)
- Address Offset
0xC0D
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Branch mis-predicted (mis_predict
)
- Address Offset
0xC0E
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Scoreboard Full (sb_full
)
- Address Offset
0xC0F
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Instruction Fetch Queue Empty (if_empty
)
- Address Offset
0xC10
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Hardware performance event counter.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Upper 32-bits of Cycle counter (cycleh
)
- Address Offset
0xC80
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Cycle counter for RDCYCLE instruction.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Upper 32-bit of Timer (timeh
)
- Address Offset
0xC81
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Timer for RDTIME instruction.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Upper 32-bits of Instruction Retired counter (instreth
)
- Address Offset
0xC82
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Instructions-retired counter for RDINSTRET instruction
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Count |
RO |
0b0 |
Machine Vendor ID (mvendorid
)
- Address Offset
0xF11
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
This register provids the JEDEC manufacturer ID of the provider of the core.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:7] |
|
Bank |
RO |
0b0 |
[6:0] |
|
Offset |
RO |
0b0 |
- Bank (
bank
) Contain encoding for number of one-byte continuation codes discarding the parity bit.
- Offset (
offset
) Contain encording for the final byte discarding the parity bit.
Machine Architecture ID (marchid
)
- Address Offset
0xF12
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000003
- Description
This register encodes the base microarchitecture of the hart.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Architecture ID |
RO |
0b11 |
- Architecture ID (
architecture_id
) Provide Encoding the base microarchitecture of the hart.
Machine Implementation ID (mimpid
)
- Address Offset
0xF13
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
Provides a unique encoding of the version of the processor implementation.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Implementation |
RO |
0b0 |
- Implementation (
implementation
) Provides unique encoding of the version of the processor implementation.
Machine Hardware Thread ID (mhartid
)
- Address Offset
0xF14
- Width (bits)
32
- Access Type
RO
- Reset Value
0x00000000
- Description
This register contains the integer ID of the hardware thread running the code.
Bits |
Name |
Display Name |
Access Type |
Reset |
---|---|---|---|---|
[31:0] |
|
Hart ID |
RO |
0b0 |
- Hart ID (
hart_id
) Contains the integer ID of the hardware thread running the code.
AXI
Introduction
In this chapter, we describe in detail the restriction that apply to the supported features.
In order to understand how the AXI memory interface behaves in CVA6, it is necessary to read the AMBA AXI and ACE Protocol Specification (https://developer.arm.com/documentation/ihi0022/hc) and this chapter.
About the AXI4 protocol
The AMBA AXI protocol supports high-performance, high-frequency system designs for communication between Manager and Subordinate components.
The AXI protocol features are:
It is suitable for high-bandwidth and low-latency designs.
High-frequency operation is provided, without using complex bridges.
The protocol meets the interface requirements of a wide range of components.
It is suitable for memory controllers with high initial access latency.
Flexibility in the implementation of interconnect architectures is provided.
It is backward-compatible with AHB and APB interfaces.
The key features of the AXI protocol are:
Separate address/control and data phases.
Support for unaligned data transfers, using byte strobes.
Uses burst-based transactions with only the start address issued.
Separate read and write data channels, that can provide low-cost Direct Memory Access (DMA).
Support for issuing multiple outstanding addresses.
Support for out-of-order transaction completion.
Permits easy addition of register stages to provide timing closure.
The present specification is based on :
AXI4 and CVA6
The AXI bus protocol is used with the CVA6 processor as a memory interface. Since the processor is the one that initiates the connection with the memory, it will have a manager interface to send requests to the subordinate, which will be the memory.
Features supported by CVA6 are the ones in the AMBA AXI4 specification and the Atomic Operation feature from AXI5. With restriction that apply to some features.
This doesn’t mean that all the full set of signals available on an AXI interface are supported by the CVA6. Nevertheless, all required AXI signals are implemented.
Supported AXI4 features are defined in AXI Protocol Specification sections: A3, A4, A5, A6 and A7.
Supported AXI5 feature are defined in AXI Protocol Specification section: E1.1.
Signal Description (Section A2)
This section introduces the AXI memory interface signals of CVA6. Most of the signals are supported by CVA6, the tables summarizing the signals identify the exceptions.
In the following tables, the Src column tells whether the signal is driven by Manager ou Subordinate.
The AXI required and optional signals, and the default signals values that apply when an optional signal is not implemented are defined in AXI Protocol Specification section A9.3.
Global signals (Section A2.1)
Table 2.1 shows the global AXI memory interface signals.
Signal |
Src |
Description |
---|---|---|
ACLK |
Clock source |
Global clock signal. Synchronous signals are sampled on the
rising edge of the global clock.
|
WDATA |
Reset source |
Global reset signal. This signal is active-LOW.
|
Write address channel signals (Section A2.2)
Table 2.2 shows the AXI memory interface write address channel signals. Unless the description indicates otherwise, a signal can take any parameter if is supported.
Signal |
Src |
Support |
Description |
---|---|---|---|
AWID |
M |
Yes
(optional)
|
Identification tag for a write transaction.
CVA6 gives the id depending on the type of transaction.
|
AWADDR |
M |
Yes |
The address of the first transfer in a write transaction.
|
AWLEN |
M |
Yes
(optional)
|
Length, the exact number of data transfers in a write
transaction. This information determines the number of
data transfers associated with the address.
All write transactions performed by CVA6 are of length 1.
(AWLEN = 0b00000000)
|
AWSIZE |
M |
Yes
(optional)
|
Size, the number of bytes in each data transfer in a write
transaction
|
AWBURST |
M |
Yes
(optional)
|
Burst type, indicates how address changes between each
transfer in a write transaction.
All write transactions performed by CVA6 are of burst type
INCR. (AWBURST = 0b01)
|
AWLOCK |
M |
Yes
(optional)
|
Provides information about the atomic characteristics of a
write transaction.
|
AWCACHE |
M |
Yes
(optional)
|
Indicates how a write transaction is required to progress
through a system.
The subordinate is always of type Device Non-bufferable.
(AWCACHE = 0b0000)
|
AWPROT |
M |
Yes |
Protection attributes of a write transaction:
privilege, security level, and access type.
The value of AWPROT is always 0b000.
|
AWQOS |
M |
No
(optional)
|
Quality of Service identifier for a write transaction.
AWQOS = 0b0000
|
AWREGION |
M |
No
(optional)
|
Region indicator for a write transaction.
AWREGION = 0b0000
|
AWUSER |
M |
No
(optional)
|
User-defined extension for the write address channel.
AWUSER = 0b00
|
AWATOP |
M |
Yes
(optional)
|
AWATOP indicates the Properties of the Atomic Operation
used for a write transaction.
|
AWVALID |
M |
Yes |
Indicates that the write address channel signals are valid.
|
AWREADY |
S |
Yes |
Indicates that a transfer on the write address channel
can be accepted.
|
Write data channel signals (Section A2.3)
Table 2.3 shows the AXI write data channel signals. Unless the description indicates otherwise, a signal can take any parameter if is supported.
Signal |
Src |
Support |
Description |
---|---|---|---|
WID |
M |
Yes
(optional)
|
The ID tag of the write data transfer.
CVA6 gives the id depending on the type of transaction.
|
WDATA |
M |
Yes |
Write data.
|
WSTRB |
M |
Yes
(optional)
|
Write strobes, indicate which byte lanes hold valid data
|
WLAST |
M |
Yes |
Indicates whether this is the last data transfer in a write
transaction.
|
WUSER |
M |
Yes
(optional)
|
User-defined extension for the write data channel.
|
WVALID |
M |
Yes |
Indicates that the write data channel signals are valid.
|
WREADY |
S |
Yes |
Indicates that a transfer on the write data channel can be
accepted.
|
Write Response Channel signals (Section A2.4)
Table 2.4 shows the AXI write response channel signals. Unless the description indicates otherwise, a signal can take any parameter if is supported.
Signal |
Src |
Support |
Description |
---|---|---|---|
BID |
S |
Yes
(optional)
|
Identification tag for a write response.
CVA6 gives the id depending on the type of transaction.
|
BRESP |
S |
Yes |
Write response, indicates the status of a write transaction.
|
BUSER |
S |
No
(optional)
|
User-defined extension for the write response channel.
BUSER= 0b00
|
BVALID |
S |
Yes |
Indicates that the write response channel signals are valid.
|
BREADY |
M |
Yes |
Indicates that a transfer on the write response channel can be
accepted.
|
Read address channel signals (Section A2.5)
Table 2.5 shows the AXI read address channel signals. Unless the description indicates otherwise, a signal can take any parameter if is supported.
Signal |
Src |
Support |
Description |
---|---|---|---|
ARID |
M |
Yes
(optional)
|
Identification tag for a read transaction.
CVA6 gives the id depending on the type of transaction.
|
ARADDR |
M |
Yes
|
The address of the first transfer in a read transaction.
|
ARLEN |
M |
Yes
(optional)
|
Length, the exact number of data transfers in a read
transaction. This information determines the number of data
transfers associated with the address.
All read transactions performed by CVA6 are of length less or
equal to ICACHE_LINE_WIDTH/64.
|
ARSIZE |
M |
Yes
(optional)
|
Size, the number of bytes in each data transfer in a read
transaction
|
ARBURST |
M |
Yes
(optional)
|
Burst type, indicates how address changes between each
transfer in a read transaction.
All Read transactions performed by CVA6 are of burst type INCR.
(ARBURST = 0b01)
|
ARLOCK |
M |
Yes
(optional)
|
Provides information about the atomic characteristics of
a read transaction.
|
ARCACHE |
M |
Yes
(optional)
|
Indicates how a read transaction is required to progress
through a system.
The memory is always of type Device Non-bufferable.
(ARCACHE = 0b0000)
|
ARPROT |
M |
Yes
|
Protection attributes of a read transaction:
privilege, security level, and access type.
The value of ARPROT is always 0b000.
|
ARQOS |
M |
No
(optional)
|
Quality of Service identifier for a read transaction.
ARQOS= 0b00
|
ARREGION |
M |
No
(optional)
|
Region indicator for a read transaction.
ARREGION= 0b00
|
ARUSER |
M |
No
(optional)
|
User-defined extension for the read address channel.
ARUSER= 0b00
|
ARVALID |
M |
Yes
(optional)
|
Indicates that the read address channel signals are valid.
|
ARREADY |
S |
Yes
(optional)
|
Indicates that a transfer on the read address channel can be
accepted.
|
Read data channel signals (Section A2.6)
Table 2.6 shows the AXI read data channel signals. Unless the description indicates otherwise, a signal can take any parameter if is supported.
Signal |
Src |
Support |
Description |
---|---|---|---|
RID |
S |
Yes
(optional)
|
The ID tag of the read data transfer.
CVA6 gives the id depending on the type of transaction.
|
RDATA |
S |
Yes |
Read data.
|
RLAST |
S |
Yes |
Indicates whether this is the last data transfer in a read
transaction.
|
RUSER |
S |
Yes
(optional)
|
User-defined extension for the read data channel.
Not supported. (RUSER= 0b00)
|
RVALID |
S |
Yes |
Indicates that the read data channel signals are valid.
|
RREADY |
M |
Yes |
Indicates that a transfer on the read data channel can be accepted.
|
Single Interface Requirements: Transaction structure (Section A3.4)
This section describes the structure of transactions. The following sections define the address, data, and response structures
Address structure (Section A3.4.1)
The AXI protocol is burst-based. The Manager begins each burst by driving control information and the address of the first byte in the transaction to the Subordinate. As the burst progresses, the Subordinate must calculate the addresses of subsequent transfers in the burst.
Burst length
The burst length is specified by:
ARLEN[7:0], for read transfers
AWLEN[7:0], for write transfers
The burst length for AXI4 is defined as:
Burst_Length = AxLEN[3:0] + 1
CVA6 has some limitation governing the use of bursts:
All read transactions performed by CVA6 are of burst length less or equal to ICACHE_LINE_WIDTH/64.
All write transactions performed by CVA6 are of burst length equal to 1.
Burst size
The maximum number of bytes to transfer in each data transfer, or beat, in a burst, is specified by:
ARSIZE[2:0], for read transfers
AWSIZE[2:0], for write transfers
AXI DATA WIDTH used by CVA6 is 64-bit. For that, the maximum value can be taking by AXSIZE is 3 (8 bytes by transfer).
Burst type
The AXI protocol defines three burst types:
FIXED
INCR
WRAP
The burst type is specified by:
ARBURST[1:0], for read transfers
AWBURST[1:0], for write transfers
All transactions performed by CVA6 are of burst type INCR. (AXBURST = 0b01)
Data read and write structure: Write strobes (Section A3.4.4)
The WSTRB[n:0] signals when HIGH, specify the byte lanes of the data bus that contain valid information. There is one write strobe for each 8 bits of the write data bus, therefore WSTRB[n] corresponds to WDATA[(8n)+7: (8n)].
AXI DATA WIDTH used by CVA6 is 64-bit. Therefore, Write Strobe width is equal to eight (n = 7).
Read and write response structure (Section A3.4.5)
The AXI protocol provides response signaling for both read and write transactions:
For read transactions, the response information from the Subordinate is signaled on the read data channel.
For write transactions, the response information is signaled on the write response channel.
CVA6 does not consider the responses sent by the memory except in the exclusive Access ( XRESP[1:0] = 0b01 ).
Transaction Attributes: Memory types (Section A4)
This section describes the attributes that determine how a transaction should be treated by the AXI subordinate that is connected to the CVA6.
AXCACHE always take 0b0000. The subordinate should be a Device Non-bufferable.
The required behavior for Device Non-bufferable memory is:
The write response must be obtained from the final destination.
Read data must be obtained from the final destination.
Transactions are Non-modifiable.
Reads must not be prefetched. Writes must not be merged.
Transaction Identifiers (Section A5)
The AXI protocol includes AXI ID transaction identifiers. A Manager can use these to identify separate transactions that must be returned in order.
The CVA6 identify each type of transaction with a specific ID
For read transaction id can be 0 or 1.
For write transaction id = 1.
For Atomic operation id = 3. This ID must be sent in the write channels and also in the read channel if the transaction performed requires response data.
AXI Ordering Model (Section A6)
AXI ordering model overview (Section A6.1)
The AXI ordering model is based on the use of the transaction identifier, which is signaled on ARID or AWID.
Transaction requests on the same channel, with the same ID and destination are guaranteed to remain in order.
Transaction responses with the same ID are returned in the same order as the requests were issued.
Write transaction requests, with the same destination are guaranteed to remain in order. Because all write transaction performed by CVA6 have the same ID.
CVA6 can perform multiple outstanding write address transactions.
CVA6 cannot perform a Read transaction and a Write one at the same time. Therefore there no ordering problems between Read and write transactions.
The ordering model does not give any ordering guarantees between:
Transactions from different Managers
Read Transactions with different IDs
Transactions to different Memory locations
If the CVA6 requires ordering between transactions that have no ordering guarantee, the Manager must wait to receive a response to the first transaction before issuing the second transaction.
Memory locations and Peripheral regions (Section A6.2)
The address map in AMBA is made up of Memory locations and Peripheral regions. But the AXI is associated to the memory interface of CVA6.
A Memory location has all of the following properties:
A read of a byte from a Memory location returns the last value that was written to that byte location.
A write to a byte of a Memory location updates the value at that location to a new value that is obtained by a subsequent read of that location.
Reading or writing to a Memory location has no side-effects on any other Memory location.
Observation guarantees for Memory are given for each location.
The size of a Memory location is equal to the single-copy atomicity size for that component.
Transactions and ordering (Section A6.3)
A transaction is a read or a write to one or more address locations. The locations are determined by AxADDR and any relevant qualifiers such as the Non-secure bit in AxPROT.
Ordering guarantees are given only between accesses to the same Memory location or Peripheral region.
A transaction to a Peripheral region must be entirely contained within that region.
A transaction that spans multiple Memory locations has multiple ordering guarantees.
Transaction performed by CVA6 is of type Device. Because AxCACHE[1] deasserted.
Device transactions can be used to access Peripheral regions or Memory locations.
A write transaction performed by CVA6 is Non-bufferable (It is possible to send an early response to Bufferable write). Because AxCACHE[0] deasserted.
Ordered write observation (Section A6.8)
To improve compatibility with interface protocols that support a different ordering model, a Subordinate interface can give stronger ordering guarantees for write transactions. A stronger ordering guarantee is known as Ordered Write Observation.
The CVA6 AXI interface exhibits Ordered Write Observation, so the Ordered_Write_Observation property is True.
An interface that exhibits Ordered Write Observation gives guarantees for write transactions that are not dependent on the destination or address:
A write W1 is guaranteed to be observed by a write W2, where W2 is issued after W1, from the same Manager, with the same ID.
Atomic transactions (Section E1.1)
AMBA 5 introduces Atomic transactions, which perform more than just a single access and have an operation that is associated with the transaction. Atomic transactions enable sending the operation to the data, permitting the operation to be performed closer to where the data is located. Atomic transactions are suited to situations where the data is located a significant distance from the agent that must perform the operation.
CVA6 support just the AtomicLoad and AtomicSwap transaction. So AWATOP[5:4] can be 00, 10 or 11
CVA6 perform only little-endian operation. So AWATOP[3] = 0
For AtomicLoad, CVA6 support all arithmetic operations encoded on the lower-order AWATOP[2:0] signals
Glossary
VLEN: Virtual address lengh
XLEN: RISC-V processor data lengh
ALU: Arithmetic/Logic Unit
ASIC: Application-Specific Integrated Circuit
Byte: 8-bit data item
CPU: Central Processing Unit, processor
CSR: Control and Status Register
Custom extension: Non-Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA)
EXE: Instruction Execute
FPGA: Field Programmable Gate Array
FPU: Floating Point Unit
Halfword: 16-bit data item
Halfword aligned address: An address is halfword aligned if it is divisible by 2
ID: Instruction Decode
IF: Instruction Fetch
ISA: Instruction Set Architecture
KGE: kilo gate equivalents (NAND2)
LSU: Load Store Unit
M-Mode: Machine Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture)
OBI: Open Bus Interface
PC: Program Counter
PULP platform: Parallel Ultra Low Power Platform (<https://pulp-platform.org>)
RV32C: RISC-V Compressed (C extension)
RV32F: RISC-V Floating Point (F extension)
SIMD: Single Instruction/Multiple Data
Standard extension: Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA)
WARL: Write Any Values, Reads Legal Values
WB: Write Back of instruction results
WLRL: Write/Read Only Legal Values
Word: 32-bit data item
Word aligned address: An address is word aligned if it is divisible by 4
WPRI: Reserved Writes Preserve Values, Reads Ignore Values