
Figure 2 CV32E40X Pipeline
Pipeline Details
CV32E40X has a 4-stage in-order completion pipeline, the 4 stages are:
- Instruction Fetch (IF)
Fetches instructions from memory via an aligning prefetch buffer, capable of fetching 1 instruction per cycle if the instruction side memory system allows. The IF stage also pre-decodes RVC instructions into RV32I base instructions. See Instruction Fetch for details.
- Instruction Decode (ID)
Decodes fetched instruction and performs required register file reads. Jumps are taken from the ID stage.
- Execute (EX)
Executes the instructions. The EX stage contains the ALU, Multiplier and Divider. Branches (with their condition met) are taken from the EX stage. Multi-cycle instructions will stall this stage until they are complete. The address generation part of the load-store-unit (LSU) is contained in EX as well.
- Writeback (WB)
Writes the result of ALU, Multiplier, Divider, or Load instructions instructions back to the register file.
Multi- and Single-Cycle Instructions
Table 4 shows the cycle count per instruction type. Some instructions have a variable time, this is indicated as a range e.g. 1..32 means that the instruction takes a minimum of 1 cycle and a maximum of 32 cycles. The cycle counts assume zero stall on the instruction-side interface and zero stall on the data-side memory interface.
Instruction Type |
Cycles |
Description |
---|---|---|
Integer Computational |
1 |
Integer Computational Instructions are defined in the RISCV-V RV32I Base Integer Instruction Set. |
CSR Access |
4 (mstatus, mepc, mtvec, mcause, mcycle, minstret, mhpmcounter*, mcycleh, minstreth, mhpmcounter*h, mcountinhibit, mhpmevent*, dscr, dpc, dscratch0, dscratch1, privlv) 1 (all the other CSRs) |
CSR Access Instruction are defined in ‘Zicsr’ of the RISC-V specification. |
Load/Store |
1 2 (non-word aligned word transfer) 2 (halfword transfer crossing word boundary) |
Load/Store is handled in 1 bus transaction using both EX and WB stages for 1 cycle each. For misaligned word transfers and for halfword transfers that cross a word boundary 2 bus transactions are performed using EX and WB stages for 2 cycles each. |
Multiplication |
1 (mul) 4 (mulh, mulhsu, mulhu) |
CV32E40X uses a single-cycle 32-bit x 32-bit multiplier with a 32-bit result. The multiplications with upper-word result take 4 cycles to compute. |
Division Remainder |
3 - 35 3 - 35 |
The number of cycles depends on the divider operand value (operand b), i.e. in the number of leading bits at 0. The minimum number of cycles is 3 when the divider has zero leading bits at 0 (e.g., 0x8000000). The maximum number of cycles is 35 when the divider is 0 |
Jump |
2 3 (target is a non-word-aligned non-RVC instruction) |
Jumps are performed in the ID stage. Upon a jump the IF stage (including prefetch buffer) is flushed. The new PC request will appear on the instruction-side memory interface the same cycle the jump instruction is in the ID stage. |
mret |
2 3 (target is a non-word-aligned non-RVC instruction) |
Mret is performed in the ID stage. Upon an mret the IF stage (including prefetch buffer) is flushed. The new PC request will appear on the instruction-side memory interface the same cycle the mret instruction is in the ID stage. |
Branch (Not-Taken) |
1 |
Any branch where the condition is not met will not stall. |
Branch (Taken) |
3 4 (target is a non-word-aligned non-RVC instruction) |
The EX stage is used to compute the branch decision. Any branch where the condition is met will be taken from the EX stage and will cause a flush of the IF stage (including prefetch buffer) and ID stage. |
Instruction Fence |
5 6 (target is a non-word-aligned non-RVC instruction) |
The FENCE.I instruction as defined in ‘Zifencei’ of the RISC-V specification. Internally it is implemented as a jump to the instruction following the fence. The jump performs the required flushing as described above. |
Hazards
The CV32E40X experiences a 1 cycle penalty on the following hazards.
Load data hazard (in case the instruction immediately following a load uses the result of that load)
Jump register (jalr) data hazard (in case that a jalr depends on the result of an immediately preceding non-load instruction)
The CV32E40X experiences a 2 cycle penalty on the following hazards.
Jump register (jalr) data hazard (in case that a jalr depends on the result of an immediately preceding load instruction)