Pipeline Details
CV32E40X has a 4-stage in-order completion pipeline, the 4 stages are:
- Instruction Fetch (IF)
Fetches instructions from memory via an aligning prefetch buffer, capable of fetching 1 instruction per cycle if the instruction side memory system allows. The IF stage also pre-decodes RVC instructions into RV32I base instructions. See Instruction Fetch for details.
- Instruction Decode (ID)
Decodes fetched instruction and performs required register file reads. Jumps are taken from the ID stage.
- Execute (EX)
Executes the instructions. The EX stage contains the ALU, Multiplier and Divider. Branches (with their condition met) are taken from the EX stage. Multi-cycle instructions will stall this stage until they are complete. The address generation part of the load-store-unit (LSU) is contained in EX as well.
- Writeback (WB)
Writes the result of ALU, Multiplier, Divider, or Load instructions instructions back to the register file.
Multi- and Single-Cycle Instructions
Table 4 shows the cycle count per instruction type. Some instructions have a variable time, this is indicated as a range e.g. 1..32 means that the instruction takes a minimum of 1 cycle and a maximum of 32 cycles. The cycle counts assume zero stall on the instruction-side interface and zero stall on the data-side memory interface.
Instruction Type |
Cycles |
Description |
---|---|---|
Integer Computational |
1 |
Integer Computational Instructions are defined in the RISCV-V RV32I Base Integer Instruction Set. |
CSR Access |
4 ( 1 (all the other CSRs) |
CSR Access Instruction are defined in ‘Zicsr’ of the RISC-V specification. |
Load/Store |
1 2 (non-word aligned word transfer) 2 (halfword transfer crossing word boundary) |
Load/Store is handled in 1 bus transaction using both EX and WB stages for 1 cycle each. For misaligned word transfers and for halfword transfers that cross a word boundary 2 bus transactions are performed using EX and WB stages for 2 cycles each. |
Multiplication |
1 ( 4 ( |
CV32E40X uses a single-cycle 32-bit x 32-bit multiplier with a 32-bit result. The multiplications with upper-word result take 4 cycles to compute. |
Division Remainder |
3 - 35 3 - 35 |
The number of cycles depends on the divider operand value (operand b), i.e. in the number of leading bits at 0. The minimum number of cycles is 3 when the divider has zero leading bits at 0 (e.g., 0x8000000). The maximum number of cycles is 35 when the divider is 0 |
Jump |
2 3 (target is a non-word-aligned non-RVC instruction) |
Jumps are performed in the ID stage. Upon a jump the IF stage (including prefetch buffer) is flushed. The new PC request will appear on the instruction-side memory interface the same cycle the jump instruction is in the ID stage. |
mret |
2 3 (target is a non-word-aligned non-RVC instruction) |
Mret is performed in the ID stage. Upon an mret the IF stage (including prefetch buffer) is flushed. The new PC request will appear on the instruction-side memory interface the same cycle the mret instruction is in the ID stage. |
Branch (Not-Taken) |
1 |
Any branch where the condition is not met will not stall. |
Branch (Taken) |
3 4 (target is a non-word-aligned non-RVC instruction) |
The EX stage is used to compute the branch decision. Any branch where the condition is met will be taken from the EX stage and will cause a flush of the IF stage (including prefetch buffer) and ID stage. |
|
5 6 (target is a non-word-aligned non-RVC instruction) |
The |
|
5 6 (target is a non-word-aligned non-RVC instruction) |
The |
Zba, Zbb, Zbc, Zbs |
1 |
All instructions from Zba, Zbb, Zbc, Zbs take 1 cycle. |
Zcmt |
2 |
Table jumps take 2 cycles. |
Zcmp |
2 - 18 |
The number of cycles depends on the number of registers saved or restored by the instructions. |
Zca, Zcb |
1 |
Instructions from Zca and Zcb take 1 cycle. |
|
2 - |
Instructions causing sleep will not retire until wakeup. |
Hazards
The CV32E40X experiences a 1 cycle penalty on the following hazards.
Load data hazard (in case the instruction immediately following a load uses the result of that load).
Jump register (
jalr
) data hazard (in case that ajalr
depends on the result of an immediately preceding non-load instruction).An instruction causing an implicit CSR read in ID (
mret
or table jump) while a CSR access instruction or an instruction causing an implicit CSR access is in the WB stage.An instruction causing an implicit CSR read in EX while a CSR access instruction or an instruction causing an implicit CSR access is in the WB stage.
An instruction causing an explicit CSR read in EX while an instruction causing an implicit CSR write is in the WB stage.
An instruction causing an explicit CSR read in EX while there is a RAW hazard with an explicit CSR write in WB.
The CV32E40X experiences a 2 cycle penalty on the following hazards.
Jump register (
jalr
) data hazard (in case that ajalr
depends on the result of an immediately preceding load instruction).An instruction causing an implicit CSR read in ID (
mret
or table jump) while a CSR access instruction or an instruction causing an implicit CSR access is in the EX stage.
Note
Implicit CSR reads are reads performed by non-CSR instructions or CSR instructions reading CSR values from another CSR. Explicit CSR reads and writes are CSR instructions accessing the CSR encoded in the instruction word.