CV32A65X DESIGN DOCUMENT

Design Documentation for CV32A65X architecture

Editor: Jean Roch Coulon

1. Introduction

The OpenHW Group uses semantic versioning to describe the release status of its IP. This document describes the CV32A65X configuration version of CVA6. This intends to be the first formal release of CVA6.

CVA6 is a 6-stage in-order and single issue processor core which implements the RISC-V instruction set. CVA6 can be configured as a 32- or 64-bit core (RV32 or RV64), called CV32A6 or CV64A6.

The objective of this document is to provide enough information to allow the RTL modification (by designers) and the RTL verification (by verificators). This document is not dedicated to CVA6 users looking for information to develop software like instructions or registers.

The CVA6 architecture is illustrated in the following figure.

CVA6 Architecture

1.1. License

Copyright 2022 Thales
Copyright 2018 ETH Zürich and University of Bologna
SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
Licensed under the Solderpad Hardware License v 2.1 (the “License”); you may not use this file except in compliance with the License, or, at your option, the Apache License version 2.0. You may obtain a copy of the License at https://solderpad.org/licenses/SHL-2.1/.
Unless required by applicable law or agreed to in writing, any work distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

1.2. Standards Compliance

To ease the reading, the reference to these specifications can be implicit in the requirements below. For the sake of precision, the requirements identify the versions of RISC-V extensions from these specifications.

[CVA6req] “CVA6 requirement specification”, https://github.com/openhwgroup/cva6/blob/master/docs/specifications/cva6_requirement_specification.rst, HASH#767c465.
[RVunpriv] “The RISC-V Instruction Set Manual for CV32A60X, Volume I: User-Level ISA”, https://docs.openhwgroup.org/projects/cva6-user-manual/07_cv32a60x/riscv/unpriv.html.
[RVpriv] “The RISC-V Instruction Set Manual for CV32A60X, Volume II: Privileged Architecture”, https://docs.openhwgroup.org/projects/cva6-user-manual/07_cv32a60x/riscv/priv.html.
[RVcompat] “RISC-V Architectural Compatibility Test Framework”, https://github.com/riscv-non-isa/riscv-arch-test.
[OBI] OBI Specification v1.6.0, https://github.com/openhwgroup/obi/blob/072d9173c1f2d79471d6f2a10eae59ee387d4c6f/OBI-v1.6.0.pdf
[CV-X-IF] Core-V eXtension interface (CV-X-IF) v1.0.0, https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en/v1.0.0/.

CV32A65X is a standards-compliant 32-bit processor fully compliant with above specifications. In the meanwhile, the conformity of CV32A65X versus [RVcompat] has not been tested.

1.3. Documentation framework

The framework of this document is inspired by the Common Criteria. The Common Criteria for Information Technology Security Evaluation (referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for computer security certification.

Description of the framework:

Processor is split into modules corresponding to the main modules of the design
Modules can contain several modules
Each module is described in a chapter, which contains the following subchapters: Description, Functionalities, Architecture and Modules and Registers (if any)
The subchapter Description describes the main features of the submodule, the interconnections between the current module and the others and the inputs/outputs interface.
The subchapter Functionality lists in details the module functionalities. Please avoid using the RTL signal names to explain the functionalities.
The subchapter Architecture and Modules provides a drawing to present the module hierarchy, then the functionalities covered by the module
The subchapter Registers specifies the module registers if any

1.4. Contributors

Jean-Roch Coulon - Thales
André Sintzoff - Thales
Yannick Casamatta - Thales
Guillaume Chauvon - Thales
Ayoub Jalali ([email protected])
Alae Eddine Ezzejjari ([email protected])

2. Subsystem

2.1. Global functionality

The CV32A65X is a subsystem composed of the modules and protocol interfaces as illustrated. The processor is a Harvard-based modern architecture. Instructions are issued in-order through the DECODE stage and executed out-of-order but committed in-order. The processor is Single issue, that means that at maximum one instruction per cycle can be issued to the EXECUTE stage.

The CVA6 implements a 6-stage pipeline composed of PC Generation, Instruction Fetch, Instruction Decode, Issue stage, Execute stage and Commit stage. At least 6 cycles are needed to execute one instruction.

2.2. Connection with other sub-systems

The submodule is connected to:

OBI interconnect gives access to memory content
COPROCESSOR connects through CV-X-IF coprocessor interface protocol
TRAP provides traps inputs

2.3. Parameter configuration

Table 1. cv32a65x parameter configuration
Name	description	description
XLEN	General Purpose Register Size (in bits)	32
VLEN	Virtual address Size (in bits)	32
RVA	Atomic RISC-V extension	False
RVB	Bit manipulation RISC-V extension	True
ZKN	Scalar Cryptography RISC-V entension	False
RVV	Vector RISC-V extension	False
RVC	Compress RISC-V extension	True
RVH	Hypervisor RISC-V extension	False
RVZCB	Zcb RISC-V extension	True
RVZCMP	Zcmp RISC-V extension	False
RVZCMT	Zcmt RISC-V extension	False
RVZiCond	Zicond RISC-V extension	False
RVZicntr	Zicntr RISC-V extension	False
RVZifencei	Zifencei RISC-V extension	False
RVZihpm	Zihpm RISC-V extension	False
RVF	Floating Point	False
RVD	Floating Point	False
XF16	Non standard 16bits Floating Point extension	False
XF16ALT	Non standard 16bits Floating Point Alt extension	False
XF8	Non standard 8bits Floating Point extension	False
XFVec	Non standard Vector Floating Point extension	False
PerfCounterEn	Perf counters	False
MmuPresent	MMU	False
RVS	Supervisor mode	False
RVU	User mode	False
SoftwareInterruptEn	Software interrupts are enabled	False
DebugEn	Debug support	False
DmBaseAddress	Base address of the debug module	0x0
HaltAddress	Address to jump when halt request	0x800
ExceptionAddress	Address to jump when exception	0x808
TvalEn	Tval Support Enable	False
DirectVecOnly	MTVEC CSR supports only direct mode	True
NrPMPEntries	PMP entries number	8
PMPCfgRstVal	PMP CSR configuration reset values	[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]
PMPAddrRstVal	PMP CSR address reset values	[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]
PMPEntryReadOnly	PMP CSR read-only bits	0
PMPNapotEn	PMP NA4 and NAPOT mode enable	False
NrNonIdempotentRules	PMA non idempotent rules number	0
NonIdempotentAddrBase	PMA NonIdempotent region base address	[0b0, 0b0]
NonIdempotentLength	PMA NonIdempotent region length	[0b0, 0b0]
NrExecuteRegionRules	PMA regions with execute rules number	0
ExecuteRegionAddrBase	PMA Execute region base address	[0x80000000, 0x10000, 0x0]
ExecuteRegionLength	PMA Execute region address base	[0x40000000, 0x10000, 0x1000]
NrCachedRegionRules	PMA regions with cache rules number	1
CachedRegionAddrBase	PMA cache region base address	[0x80000000]
CachedRegionLength	PMA cache region rules	[0x40000000]
CvxifEn	CV-X-IF coprocessor interface enable	True
CoproType	Coprocessor type	config_pkg::COPRO_EXAMPLE
NOCType	NOC bus type	config_pkg::NOC_TYPE_AXI4_ATOP
AxiAddrWidth	AXI address width	64
AxiDataWidth	AXI data width	64
AxiIdWidth	AXI ID width	4
AxiUserWidth	AXI User width	32
AxiBurstWriteEn	AXI burst in write	False
MemTidWidth	Transaction ID	4
IcacheByteSize	Instruction cache size (in bytes)	2048
IcacheSetAssoc	Instruction cache associativity (number of ways)	2
IcacheLineWidth	Instruction cache line width	128
DCacheType	Cache Type	config_pkg::HPDCACHE_WT
DcacheIdWidth	Data cache ID	1
DcacheByteSize	Data cache size (in bytes)	2028
DcacheSetAssoc	Data cache associativity (number of ways)	2
DcacheLineWidth	Data cache line width	128
DcacheFlushOnFence	Data cache flush on fence	False
DcacheInvalidateOnFlush	Data cache invalidate on flush	False
DataUserEn	User field on data bus enable	1
WtDcacheWbufDepth	Write-through data cache write buffer depth	8
FetchUserEn	User field on fetch bus enable	1
FetchUserWidth	Width of fetch user field	32
FpgaEn	Is FPGA optimization of CV32A6 for Xilinx and Altera	False
FpgaAlteraEn	Is FPGA optimization for Altera FPGA	False
TechnoCut	Is Techno Cut instanciated	True
SuperscalarEn	Enable superscalar* with 2 issue ports and 2 commit ports.	True
NrCommitPorts	Number of commit ports. Forced to 2 if SuperscalarEn.	1
NrLoadPipeRegs	Load cycle latency number	0
NrStorePipeRegs	Store cycle latency number	0
NrScoreboardEntries	Scoreboard length	8
NrLoadBufEntries	Load buffer entry buffer	2
MaxOutstandingStores	Maximum number of outstanding stores	7
RASDepth	Return address stack depth	2
BTBEntries	Branch target buffer entries	0
BHTEntries	Branch history entries	32
InstrTlbEntries	MMU instruction TLB entries	2
DataTlbEntries	MMU data TLB entries	2
UseSharedTlb	MMU option to use shared TLB	True
SharedTlbDepth	MMU depth of shared TLB	64
ObiVersion	OBI version compliance, 0 means not compliant → best performance	config_pkg::OBI_V1_6
PipelineOnly	Configuration defines cva6_pipeline module as top instead of cva6 (no cache and OBI instead of AXI)	True

2.4. IO ports

Table 2. **cva6_pipeline module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`boot_addr_i`	in	Reset boot address	SUBSYSTEM	logic[CVA6Cfg.VLEN-1:0]
`hart_id_i`	in	Hard ID reflected as CSR	SUBSYSTEM	logic[CVA6Cfg.XLEN-1:0]
`irq_i`	in	Level sensitive (async) interrupts	SUBSYSTEM	logic[1:0]
`ipi_i`	in	Inter-processor (async) interrupt	SUBSYSTEM	logic
`time_irq_i`	in	Timer (async) interrupt	SUBSYSTEM	logic
`cvxif_req_o`	out	CVXIF request	SUBSYSTEM	cvxif_req_t
`cvxif_resp_i`	in	CVXIF response	SUBSYSTEM	cvxif_resp_t
`obi_fetch_req_o`	out	Fetch OBI request ports	FRONTEND	obi_fetch_req_t
`obi_fetch_rsp_i`	in	Fetch OBI response ports	FRONTEND	obi_fetch_rsp_t
`obi_store_req_o`	out	Store OBI request ports	EX_STAGE	obi_store_req_t
`obi_store_rsp_i`	in	Store OBI response ports	EX_STAGE	obi_store_rsp_t
`obi_load_req_o`	out	Load OBI request ports	EX_STAGE	obi_load_req_t
`obi_load_rsp_i`	in	Load OBI response ports	EX_STAGE	obi_load_rsp_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As DebugEn = False,

debug_req_i input is tied to 0

As IsRVFI = 0,

rvfi_probes_o output is tied to 0

As PipelineOnly = True,

icache_enable_o output is tied to 0
icache_flush_o output is tied to 0
dcache_enable_o output is tied to 0
dcache_flush_o output is tied to 0
dcache_flush_ack_i input is tied to 0
fetch_req_o output is tied to 0
fetch_rsp_i input is tied to 0
load_req_o output is tied to 0
load_rsp_i input is tied to 0
dcache_wbuffer_empty_i input is tied to 0
dcache_wbuffer_not_ni_i input is tied to 0

As PerfCounterEn = 0,

icache_miss_i input is tied to 0
dcache_miss_i input is tied to 0

As RVA = False,

obi_amo_req_o output is tied to 0
obi_amo_rsp_i input is tied to 0

As MMUPresent = 0,

obi_mmu_ptw_req_o output is tied to 0
obi_mmu_ptw_rsp_i input is tied to 0

As RVZCMT = False,

obi_zcmt_req_o output is tied to 0
obi_zcmt_rsp_i input is tied to 0

3. Functionality

3.1. Instructions

The next subchapter lists the extensions implemented in CV32A65X. By configuration, we can enable/disable the extensions. CV32A65X supports the extensions described in the next subchapters.

3.2. isa

3.2.1. Instructions

Subset Name

Name

Description

I

RV32I Base Integer Instructions

The base integer instruction set, also known as the 'RV32I' or 'RV64I' instruction set, depending on the address space size, provides the core functionality required for general-purpose computing. It includes instructions for arithmetic, logical, and control operations, as well as memory access and manipulation

M

RV32M Multiplication and Division Instructions

The standard integer multiplication and division instruction extension, which is named “M” and contains instructions that multiply or divide values held in two integer registers.

C

RV32C Compressed Instructions

RVC uses a simple compression scheme that offers shorter 16-bit versions of common 32-bit RISC-V instructions when: the immediate or address offset is small; one of the registers is the zero register (x0), the ABI link register (x1), or the ABI stack pointer (x2); the destination register and the first source register are identical; the registers used are the 8 most popular ones. The C extension is compatible with all other standard instruction extensions. The C extension allows 16-bit instructions to be freely intermixed with 32-bit instructions, with the latter now able to start on any 16-bit boundary. With the addition of the C extension, JAL and JALR instructions will no longer raise an instruction misaligned exception

Zicsr

RV32Zicsr Control and Status Register Instructions

All CSR instructions atomically read-modify-write a single CSR, whose CSR specifier is encoded in the 12-bit csr field of the instruction held in bits 31–20. The immediate forms use a 5-bit zero-extended immediate encoded in the rs1 field.

Zcb

RV32Zcb Code Size Reduction Instructions

Zcb belongs to the group of extensions called RISC-V Code Size Reduction Extension (Zc*). Zc* has become the superset of the Standard C extension adding more 16-bit instructions to the ISA. Zcb includes the 16-bit version of additional Integer (I), Multiply (M), and Bit-Manipulation (Zbb) Instructions. All the Zcb instructions require at least standard C extension support as a prerequisite, along with M and Zbb extensions for the 16-bit version of the respective instructions.

Zba

RVZba Address generation instructions

The Zba instructions can be used to accelerate the generation of addresses that index into arrays of basic types (halfword, word, doubleword) using both unsigned word-sized and XLEN-sized indices: a shifted index is added to a base address. The shift and add instructions do a left shift of 1, 2, or 3 because these are commonly found in real-world code and because they can be implemented with a minimal amount of additional hardware beyond that of the simple adder. This avoids lengthening the critical path in implementations. While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw added in this extension can be used when the index is to be interpreted as an unsigned word.

Zbb

RVZbb Basic bit-manipulation

The bit-manipulation (bitmanip) extension collection is comprised of several component extensions to the base RISC-V architecture that are intended to provide some combination of code size reduction, performance improvement, and energy reduction. While the instructions are intended to have general use, some instructions are more useful in some domains than others. Hence, several smaller bitmanip extensions are provided. Each of these smaller extensions is grouped by common function and use case, and each has its own Zb*-extension name.

Zbc

RVZbc Carry-less multiplication

Carry-less multiplication is the multiplication in the polynomial ring over GF(2). clmul produces the lower half of the carry-less product and clmulh produces the upper half of the 2✕XLEN carry-less product. clmulr produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product.

Zbs

RVZbs Single bit Instructions

The single-bit instructions provide a mechanism to set, clear, invert, or extract a single bit in a register. The bit is specified by its index.

Xcvxif

No info found yet for extension Xcvxif

3.2.2. RV32I Base Integer Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

ADDI

addi rd, rs1, imm[11:0]

x[rd] = x[rs1] + sext(imm[11:0])

NONE

add sign-extended 12-bit immediate to register rs1, and store the result in register rd.

Integer_Register_Immediate_Operations

ANDI

andi rd, rs1, imm[11:0]

x[rd] = x[rs1] & sext(imm[11:0])

NONE

perform bitwise AND on register rs1 and the sign-extended 12-bit immediate and place the result in rd.

Integer_Register_Immediate_Operations

ORI

ori rd, rs1, imm[11:0]

x[rd] = x[rs1] | sext(imm[11:0])

NONE

perform bitwise OR on register rs1 and the sign-extended 12-bit immediate and place the result in rd.

Integer_Register_Immediate_Operations

XORI

xori rd, rs1, imm[11:0]

x[rd] = x[rs1] ^ sext(imm[11:0])

NONE

perform bitwise XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd.

Integer_Register_Immediate_Operations

SLTI

slti rd, rs1, imm[11:0]

if (x[rs1] < sext(imm[11:0])) x[rd] = 1 else x[rd] = 0

NONE

set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as signed numbers, else 0 is written to rd.

Integer_Register_Immediate_Operations

SLTIU

sltiu rd, rs1, imm[11:0]

if (x[rs1] <u sext(imm[11:0])) x[rd] = 1 else x[rd] = 0

NONE

set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as unsigned numbers, else 0 is written to rd."

Integer_Register_Immediate_Operations

SLLI

slli rd, rs1, imm[4:0]

x[rd] = x[rs1] << imm[4:0]

NONE

logical left shift (zeros are shifted into the lower bits).

Integer_Register_Immediate_Operations

SRLI

srli rd, rs1, imm[4:0]

x[rd] = x[rs1] >> imm[4:0]

NONE

logical right shift (zeros are shifted into the upper bits).

Integer_Register_Immediate_Operations

SRAI

srai rd, rs1, imm[4:0]

x[rd] = x[rs1] >>s imm[4:0]

NONE

arithmetic right shift (the original sign bit is copied into the vacated upper bits).

Integer_Register_Immediate_Operations

LUI

lui rd, imm[19:0]

x[rd] = sext(imm[31:12] << 12)

NONE

place the immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros.

Integer_Register_Immediate_Operations

AUIPC

auipc rd, imm[19:0]

x[rd] = pc + sext(immediate[31:12] << 12)

NONE

form a 32-bit offset from the 20-bit immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then place the result in register rd.

Integer_Register_Immediate_Operations

ADD

add rd, rs1, rs2

x[rd] = x[rs1] + x[rs2]

NONE

add rs2 to register rs1, and store the result in register rd.

Integer_Register_Register_Operations

SUB

sub rd, rs1, rs2

x[rd] = x[rs1] - x[rs2]

NONE

subtract rs2 from register rs1, and store the result in register rd.

Integer_Register_Register_Operations

AND

and rd, rs1, rs2

x[rd] = x[rs1] & x[rs2]

NONE

perform bitwise AND on register rs1 and rs2 and place the result in rd.

Integer_Register_Register_Operations

OR

or rd, rs1, rs2

x[rd] = x[rs1] | x[rs2]

NONE

perform bitwise OR on register rs1 and rs2 and place the result in rd.

Integer_Register_Register_Operations

XOR

xor rd, rs1, rs2

x[rd] = x[rs1] ^ x[rs2]

NONE

perform bitwise XOR on register rs1 and rs2 and place the result in rd.

Integer_Register_Register_Operations

SLT

slt rd, rs1, rs2

if (x[rs1] < x[rs2]) x[rd] = 1 else x[rd] = 0

NONE

set register rd to 1 if register rs1 is less than rs2 when both are treated as signed numbers, else 0 is written to rd.

Integer_Register_Register_Operations

SLTU

sltu rd, rs1, rs2

if (x[rs1] <u x[rs2]) x[rd] = 1 else x[rd] = 0

NONE

set register rd to 1 if register rs1 is less than rs2 when both are treated as unsigned numbers, else 0 is written to rd.

Integer_Register_Register_Operations

SLL

sll rd, rs1, rs2

x[rd] = x[rs1] << x[rs2]

NONE

logical left shift (zeros are shifted into the lower bits).

Integer_Register_Register_Operations

SRL

srl rd, rs1, rs2

x[rd] = x[rs1] >> x[rs2]

NONE

logical right shift (zeros are shifted into the upper bits).

Integer_Register_Register_Operations

SRA

sra rd, rs1, rs2

x[rd] = x[rs1] >>s x[rs2]

NONE

arithmetic right shift (the original sign bit is copied into the vacated upper bits).

Integer_Register_Register_Operations

JAL

jal rd, imm[20:1]

x[rd] = pc+4; pc += sext(imm[20:1])

NONE

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

offset is sign-extended and added to the pc to form the jump target address (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd.

Control_Transfer_Operations-Unconditional_Jumps

JALR

jalr rd, rs1, imm[11:0]

t = pc+4; pc = (x[rs1]+sext(imm[11:0]))&∼1 ; x[rd] = t

NONE

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

target address is obtained by adding the 12-bit signed immediate to the register rs1 (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd.

Control_Transfer_Operations-Unconditional_Jumps

BEQ

beq rs1, rs2, imm[12:1]

if (x[rs1] == x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are equal.

Control_Transfer_Operations-Conditional_Branches

BNE

bne rs1, rs2, imm[12:1]

if (x[rs1] != x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are not equal.

Control_Transfer_Operations-Conditional_Branches

BLT

blt rs1, rs2, imm[12:1]

if (x[rs1] < x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using signed comparison).

Control_Transfer_Operations-Conditional_Branches

BLTU

bltu rs1, rs2, imm[12:1]

if (x[rs1] <u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using unsigned comparison).

Control_Transfer_Operations-Conditional_Branches

BGE

bge rs1, rs2, imm[12:1]

if (x[rs1] >= x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using signed comparison).

Control_Transfer_Operations-Conditional_Branches

BGEU

bgeu rs1, rs2, imm[12:1]

if (x[rs1] >=u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using unsigned comparison).

Control_Transfer_Operations-Conditional_Branches

LB

lb rd, imm(rs1)

x[rd] = sext(M[x[rs1] + sext(imm[11:0])][7:0])

NONE

loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded.

loads a 8-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

LH

lh rd, imm(rs1)

x[rd] = sext(M[x[rs1] + sext(imm[11:0])][15:0])

NONE

loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary).

loads a 16-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

LW

lw rd, imm(rs1)

x[rd] = sext(M[x[rs1] + sext(imm[11:0])][31:0])

NONE

loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (4-byte boundary).

loads a 32-bit value from memory, then storing in rd (rd is calculated using signed arithmetic). The effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

LBU

lbu rd, imm(rs1)

x[rd] = zext(M[x[rs1] + sext(imm[11:0])][7:0])

NONE

loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded.

loads a 8-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

LHU

lhu rd, imm(rs1)

x[rd] = zext(M[x[rs1] + sext(imm[11:0])][15:0])

NONE

loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary).

loads a 16-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

SB

sb rs2, imm(rs1)

M[x[rs1] + sext(imm[11:0])][7:0] = x[rs2][7:0]

NONE

stores a 8-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

SH

sh rs2, imm(rs1)

M[x[rs1] + sext(imm[11:0])][15:0] = x[rs2][15:0]

NONE

an exception is raised if the memory address isn’t aligned (2-byte boundary).

stores a 16-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

SW

sw rs2, imm(rs1)

M[x[rs1] + sext(imm[11:0])][31:0] = x[rs2][31:0]

NONE

an exception is raised if the memory address isn’t aligned (4-byte boundary).

stores a 32-bit value from register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset.

Load_and_Store_Instructions

FENCE

fence pre, succ

No operation (nop)

NONE

order device I/O and memory accesses as viewed by other RISC-V harts and external devices or coprocessors. Any combination of device input (I), device output (O), memory reads ®, and memory writes (W) may be ordered with respect to any combination of the same. Informally, no other RISC-V hart or external device can observe any operation in the successor set following a FENCE before any operation in the predecessor set preceding the FENCE, as the core support 1 hart, the fence instruction has no effect so we can considerate it as a nop instruction.

Memory_Ordering

ECALL

ecall

RaiseException(EnvironmentCall)

NONE

Raise an Environment Call exception.

make a request to the supporting execution environment, which is usually an operating system. The ABI for the system will define how parameters for the environment request are passed, but usually these will be in defined locations in the integer register file.

Environment_Call_and_Breakpoints

EBREAK

ebreak

RaiseException(Breakpoint)

NONE

Raise a Breakpoint exception.

cause control to be transferred back to a debugging environment.

Environment_Call_and_Breakpoints

3.2.3. RV32M Multiplication and Division Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

MUL

mul rd, rs1, rs2

x[rd] = x[rs1] * x[rs2]

NONE

performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both rs1 and rs2 treated as signed numbers).

Multiplication Operations

MULH

mulh rd, rs1, rs2

x[rd] = (x[rs1] s*s x[rs2]) >>s 32

NONE

performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as signed numbers).

Multiplication Operations

MULHU

mulhu rd, rs1, rs2

x[rd] = (x[rs1] u*u x[rs2]) >>u 32

NONE

performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as unsigned numbers).

Multiplication Operations

MULHSU

mulhsu rd, rs1, rs2

x[rd] = (x[rs1] s*u x[rs2]) >>s 32

NONE

performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (rs1 treated as signed number, rs2 treated as unsigned number).

Multiplication Operations

DIV

div rd, rs1, rs2

x[rd] = x[rs1] /s x[rs2]

NONE

perform signed integer division of 32 bits by 32 bits (rounding towards zero).

Division Operations

DIVU

divu rd, rs1, rs2

x[rd] = x[rs1] /u x[rs2]

NONE

perform unsigned integer division of 32 bits by 32 bits (rounding towards zero).

Division Operations

REM

rem rd, rs1, rs2

x[rd] = x[rs1] %s x[rs2]

NONE

provide the remainder of the corresponding division operation DIV (the sign of rd equals the sign of rs1).

Division Operations

REMU

rem rd, rs1, rs2

x[rd] = x[rs1] %u x[rs2]

NONE

provide the remainder of the corresponding division operation DIVU.

Division Operations

3.2.4. RV32C Compressed Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

C.LI

c.li rd, imm[5:0]

x[rd] = sext(imm[5:0])

rd = x0

NONE

loads the sign-extended 6-bit immediate, imm, into register rd.

Integer Computational Instructions

C.LUI

c.lui rd, nzimm[17:12]

x[rd] = sext(nzimm[17:12] << 12)

rd = x0 & rd = x2 & nzimm = 0

NONE

loads the non-zero 6-bit immediate field into bits 17–12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination.

Integer Computational Instructions

C.ADDI

c.addi rd, nzimm[5:0]

x[rd] = x[rd] + sext(nzimm[5:0])

rd = x0 & nzimm = 0

NONE

adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd.

Integer Computational Instructions

C.ADDI16SP

c.addi16sp nzimm[9:4]

x[2] = x[2] + sext(nzimm[9:4])

rd != x2 & nzimm = 0

NONE

adds the non-zero sign-extended 6-bit immediate to the value in the stack pointer (sp=x2), where the immediate is scaled to represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure prologues and epilogues. C.ADDI16SP shares the opcode with C.LUI, but has a destination field of x2.

Integer Computational Instructions

C.ADDI4SPN

c.addi4spn rd', nzimm[9:2]

x[8 + rd'] = x[2] + zext(nzimm[9:2])

nzimm = 0

NONE

adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd'. This instruction is used to generate pointers to stack-allocated variables.

Integer Computational Instructions

C.SLLI

c.slli rd, uimm[5:0]

x[rd] = x[rd] << uimm[5:0]

rd = x0 & uimm[5] = 0

NONE

performs a logical left shift (zeros are shifted into the lower bits).

Integer Computational Instructions

C.SRLI

c.srli rd', uimm[5:0]

x[8 + rd'] = x[8 + rd'] >> uimm[5:0]

uimm[5] = 0

NONE

performs a logical right shift (zeros are shifted into the upper bits).

Integer Computational Instructions

C.SRAI

c.srai rd', uimm[5:0]

x[8 + rd'] = x[8 + rd'] >>s uimm[5:0]

uimm[5] = 0

NONE

performs an arithmetic right shift (sign bits are shifted into the upper bits).

Integer Computational Instructions

C.ANDI

c.andi rd', imm[5:0]

x[8 + rd'] = x[8 + rd'] & sext(imm[5:0])

NONE

computes the bitwise AND of the value in register rd', and the sign-extended 6-bit immediate, then writes the result to rd'.

Integer Computational Instructions

C.ADD

c.add rd, rs2

x[rd] = x[rd] + x[rs2]

rd = x0 & rs2 = x0

NONE

adds the values in registers rd and rs2 and writes the result to register rd.

Integer Computational Instructions

C.MV

c.mv rd, rs2

x[rd] = x[rs2]

rd = x0 & rs2 = x0

NONE

copies the value in register rs2 into register rd.

Integer Computational Instructions

C.AND

c.and rd', rs2'

x[8 + rd'] = x[8 + rd'] & x[8 + rs2']

NONE

computes the bitwise AND of of the value in register rd', and register rs2', then writes the result to rd'.

Integer Computational Instructions

C.OR

c.or rd', rs2'

x[8 + rd'] = x[8 + rd'] | x[8 + rs2']

NONE

computes the bitwise OR of of the value in register rd', and register rs2', then writes the result to rd'.

Integer Computational Instructions

C.XOR

c.and rd', rs2'

x[8 + rd'] = x[8 + rd'] ^ x[8 + rs2']

NONE

computes the bitwise XOR of of the value in register rd', and register rs2', then writes the result to rd'.

Integer Computational Instructions

C.SUB

c.sub rd', rs2'

x[8 + rd'] = x[8 + rd'] - x[8 + rs2']

NONE

subtracts the value in registers rs2' from value in rd' and writes the result to register rd'.

Integer Computational Instructions

C.EBREAK

c.ebreak

RaiseException(Breakpoint)

NONE

Raise a Breakpoint exception.

cause control to be transferred back to the debugging environment.

Integer Computational Instructions

C.J

c.j imm[11:1]

pc += sext(imm[11:1])

NONE

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address.

Control Transfer Instructions

C.JAL

c.jal imm[11:1]

x[1] = pc+2; pc += sext(imm[11:1])

NONE

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1.

Control Transfer Instructions

C.JR

c.jr rs1

pc = x[rs1]

rs1 = x0

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

performs an unconditional control transfer to the address in register rs1.

Control Transfer Instructions

C.JALR

c.jalr rs1

t = pc+2; pc = x[rs1]; x[1] = t

rs1 = x0

jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception.

performs the same operation as C.JR, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1.

Control Transfer Instructions

C.BEQZ

c.beqz rs1', imm[8:1]

if (x[8+rs1'] == 0) pc += sext(imm[8:1])

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1' is zero.

Control Transfer Instructions

C.BNEZ

c.bnez rs1', imm[8:1]

if (x[8+rs1'] != 0) pc += sext(imm[8:1])

NONE

no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions.

performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1' isn’t zero.

Control Transfer Instructions

C.LWSP

c.lwsp rd, uimm(x2)

x[rd] = M[x[2] + zext(uimm[7:2])][31:0]

rd = x0

loads with a destination of x0 must still raise any exceptions, also an exception if the memory address isn’t aligned (4-byte boundary).

loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2.

Load and Store Instructions

C.SWSP

c.swsp rd, uimm(x2)

M[x[2] + zext(uimm[7:2])][31:0] = x[rs2]

NONE

an exception raised if the memory address isn’t aligned (4-byte boundary).

stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2.

Load and Store Instructions

C.LW

c.lw rd', uimm(rs1')

x[8+rd'] = M[x[8+rs1'] + zext(uimm[6:2])][31:0])

NONE

an exception raised if the memory address isn’t aligned (4-byte boundary).

loads a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'.

Load and Store Instructions

C.SW

c.sw rs2', uimm(rs1')

M[x[8+rs1'] + zext(uimm[6:2])][31:0] = x[8+rs2']

NONE

an exception raised if the memory address isn’t aligned (4-byte boundary).

stores a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'.

Load and Store Instructions

3.2.5. RV32Zicsr Control and Status Register Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

CSRRW

csrrw rd, csr, rs1

t = CSRs[csr]; CSRs[csr] = x[rs1]; x[rd] = t

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read.

Control and Status Register Operations

CSRRS

csrrs rd, csr, rs1

t = CSRs[csr]; CSRs[csr] = t | x[rs1]; x[rd] = t

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.

Control and Status Register Operations

CSRRC

csrrc rd, csr, rs1

t = CSRs[csr]; CSRs[csr] = t & ∼x[rs1]; x[rd] = t

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.

Control and Status Register Operations

CSRRWI

csrrwi rd, csr, uimm[4:0]

x[rd] = CSRs[csr]; CSRs[csr] = zext(uimm[4:0])

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd. The zero-extends immediate is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read.

Control and Status Register Operations

CSRRSI

csrrsi rd, csr, uimm[4:0]

t = CSRs[csr]; CSRs[csr] = t | zext(uimm[4:0]); x[rd] = t

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.

Control and Status Register Operations

CSRRCI

csrrci rd, csr, uimm[4:0]

t = CSRs[csr]; CSRs[csr] = t & ∼zext(uimm[4:0]); x[rd] = t

NONE

Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions.

Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.

Control and Status Register Operations

3.2.6. RV32Zcb Code Size Reduction Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

C.ZEXT.B

c.zext.b rd'

x[8 + rd'] = zext(x[8 + rd'][7:0])

NONE

This instruction takes a single source/destination operand. It zero-extends the least-significant byte of the operand by inserting zeros into all of the bits more significant than 7.

Code Size Reduction Operations

C.SEXT.B

c.sext.b rd'

x[8 + rd'] = sext(x[8 + rd'][7:0])

NONE

This instruction takes a single source/destination operand. It sign-extends the least-significant byte in the operand by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. It also requires Bit-Manipulation (Zbb) extension support.

Code Size Reduction Operations

C.ZEXT.H

c.zext.h rd'

x[8 + rd'] = zext(x[8 + rd'][15:0])

NONE

This instruction takes a single source/destination operand. It zero-extends the least-significant halfword of the operand by inserting zeros into all of the bits more significant than 15. It also requires Bit-Manipulation (Zbb) extension support.

Code Size Reduction Operations

C.SEXT.H

c.sext.h rd'

x[8 + rd'] = sext(x[8 + rd'][15:0])

NONE

This instruction takes a single source/destination operand. It sign-extends the least-significant halfword in the operand by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. It also requires Bit-Manipulation (Zbb) extension support.

Code Size Reduction Operations

C.NOT

c.not rd'

x[8 + rd'] = x[8 + rd'] ^ -1

NONE

This instruction takes the one’s complement of rd'/rs1' and writes the result to the same register.

Code Size Reduction Operations

C.MUL

c.mul rd', rs2'

x[8 + rd'] = (x[8 + rd'] * x[8 + rs2'])[31:0]

NONE

performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both rd' and rs2' treated as signed numbers). It also requires M extension support.

Code Size Reduction Operations

C.LHU

c.lhu rd', uimm(rs1')

x[8+rd'] = zext(M[x[8+rs1'] + zext(uimm[1])][15:0])

NONE

an exception raised if the memory address isn’t aligned (2-byte boundary).

This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is zero extended and is written to rd'.

Code Size Reduction Operations

C.LH

c.lh rd', uimm(rs1')

x[8+rd'] = sext(M[x[8+rs1'] + zext(uimm[1])][15:0])

NONE

an exception raised if the memory address isn’t aligned (2-byte boundary).

This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is sign extended and is written to rd'.

Code Size Reduction Operations

C.LBU

c.lbu rd', uimm(rs1')

x[8+rd'] = zext(M[x[8+rs1'] + zext(uimm[1:0])][7:0])

NONE

This instruction loads a byte from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting byte is zero extended and is written to rd'.

Code Size Reduction Operations

C.SH

c.sh rs2', uimm(rs1')

M[x[8+rs1'] + zext(uimm[1])][15:0] = x[8+rs2']

NONE

an exception raised if the memory address isn’t aligned (2-byte boundary).

This instruction stores the least significant halfword of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm.

Code Size Reduction Operations

C.SB

c.sb rs2', uimm(rs1')

M[x[8+rs1'] + zext(uimm[1:0])][7:0] = x[8+rs2']

NONE

This instruction stores the least significant byte of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm.

Code Size Reduction Operations

3.2.7. RVZba Address generation instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

ADD.UW

add.uw rd, rs1, rs2

X(rd) = rs2 + EXTZ(X(rs1)[31..0])

NONE

This instruction performs an XLEN-wide addition between rs2 and the zero-extended least-significant word of rs1.

Address generation instructions

SH1ADD

sh1add rd, rs1, rs2

X(rd) = X(rs2) + (X(rs1) << 1)

NONE

This instruction shifts rs1 to the left by 1 bit and adds it to rs2.

Address generation instructions

SH1ADD.UW

sh1add.uw rd, rs1, rs2

X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 1)

NONE

This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 1 place.

Address generation instructions

SH2ADD

sh2add rd, rs1, rs2

X(rd) = X(rs2) + (X(rs1) << 2)

NONE

This instruction shifts rs1 to the left by 2 bit and adds it to rs2.

Address generation instructions

SH2ADD.UW

sh2add.uw rd, rs1, rs2

X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 2)

NONE

This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 2 places.

Address generation instructions

SH3ADD

sh3add rd, rs1, rs2

X(rd) = X(rs2) + (X(rs1) << 3)

NONE

This instruction shifts rs1 to the left by 3 bit and adds it to rs2.

Address generation instructions

SH3ADD.UW

sh3add.uw rd, rs1, rs2

X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 3)

NONE

This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 3 places.

Address generation instructions

SLLI.UW

slli.uw rd, rs1, imm

X(rd) = (EXTZ(X(rs)[31..0]) << imm)

NONE

This instruction takes the least-significant word of rs1, zero-extends it, and shifts it left by the immediate.

Address generation instructions

3.2.8. RVZbb Basic bit-manipulation

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

ANDN

andn rd, rs1, rs2

X(rd) = X(rs1) & ~X(rs2)

NONE

Performs bitwise AND operation between rs1 and bitwise inversion of rs2.

Logical_with_negate

ORN

orn rd, rs1, rs2

X(rd) = X(rs1) | ~X(rs2)

NONE

Performs bitwise OR operation between rs1 and bitwise inversion of rs2.

Logical_with_negate

XNOR

xnor rd, rs1, rs2

X(rd) = ~(X(rs1) ^ X(rs2))

NONE

Performs bitwise XOR operation between rs1 and rs2, then complements the result.

Logical_with_negate

CLZ

clz rd, rs

if [x[i]] == 1 then return(i) else return -1

NONE

Counts leading zero bits in rs.

Count_leading_trailing_zero_bits

CTZ

ctz rd, rs

if [x[i]] == 1 then return(i) else return xlen;

NONE

Counts trailing zero bits in rs.

Count_leading_trailing_zero_bits

CLZW

clzw rd, rs

if [x[i]] == 1 then return(i) else return -1

NONE

Counts leading zero bits in the least-significant word of rs.

Count_leading_trailing_zero_bits

CTZW

ctzw rd, rs

if [x[i]] == 1 then return(i) else return 32;

NONE

Counts trailing zero bits in the least-significant word of rs.

Count_leading_trailing_zero_bits

CPOP

cpop rd, rs

if rs[i] == 1 then bitcount = bitcount + 1 else ()

NONE

Counts set bits in rs.

Count_population

CPOPW

cpopw rd, rs

if rs[i] == 0b1 then bitcount = bitcount + 1 else ()

NONE

Counts set bits in the least-significant word of rs.

Count_population

MAX

max rd, rs1, rs2

if rs1_val <_s rs2_val then rs2_val else rs1_val

NONE

Returns the larger of two signed integers.

Integer_minimum_maximum

MAXU

maxu rd, rs1, rs2

if rs1_val <_u rs2_val then rs2_val else rs1_val

NONE

Returns the larger of two unsigned integers.

Integer_minimum_maximum

MIN

min rd, rs1, rs2

if rs1_val <_s rs2_val then rs1_val else rs2_val

NONE

Returns the smaller of two signed integers.

Integer_minimum_maximum

MINU

minu rd, rs1, rs2

if rs1_val <_u rs2_val then rs1_val else rs2_val

NONE

Returns the smaller of two unsigned integers.

Integer_minimum_maximum

SEXT.B

sext.b rd, rs

X(rd) = EXTS(X(rs)[7..0])

NONE

Sign-extends the least-significant byte in the source to XLEN.

Sign_and_zero_extension

SEXT.H

sext.h rd, rs

X(rd) = EXTS(X(rs)[15..0])

NONE

Sign-extends the least-significant halfword in rs to XLEN.

Sign_and_zero_extension

ZEXT.H

zext.h rd, rs

X(rd) = EXTZ(X(rs)[15..0])

NONE

Zero-extends the least-significant halfword of the source to XLEN.

Sign_and_zero_extension

ROL

rol rd, rs1, rs2

(X(rs1) << log2(XLEN)) | (X(rs1) >> (xlen - log2(XLEN)))

NONE

Performs a rotate left of rs1 by the amount in least-significant log2(XLEN) bits of rs2.

Bitwise_rotation

ROR

ror rd, rs1, rs2

(X(rs1) >> log2(XLEN)) | (X(rs1) << (xlen - log2(XLEN)))

NONE

Performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of rs2.

Bitwise_rotation

RORI

rori rd, rs1, shamt

(X(rs1) >> log2(XLEN)) | (X(rs1) << (xlen - log2(XLEN)))

NONE

Performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of shamt.

Bitwise_rotation

ROLW

rolw rd, rs1, rs2

EXTSrs1 << X(rs2)[4..0]) | (rs1)

NONE

Performs a rotate left on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2.

Bitwise_rotation

RORIW

roriw rd, rs1, shamt

(rs1_data >> shamt[4..0]) | (rs1_data << (32 - shamt[4..0]))

NONE

Performs a rotate right on the least-significant word of rs1 by the amount in least-significant log2(XLEN) bits of shamt.

Bitwise_rotation

RORW

rorw rd, rs1, rs2

(rs1 >> X(rs2)[4..0]) | (rs1 << (32 - X(rs2)[4..0]))

NONE

Performs a rotate right on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2.

Bitwise_rotation

ORC.b

orc.b rd, rs

if { input[(i + 7)..i] == 0 then 0b00000000 else 0b11111111

NONE

Sets the bits of each byte in rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set.

OR_Combine

REV8

rev8 rd, rs

output[i..(i + 7)] = input[(j - 7)..j]

NONE

Reverses the order of the bytes in rs.

Byte_reverse

3.2.9. RVZbc Carry-less multiplication

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

CLMUL

clmul rd, rs1, rs2

foreach (i from 1 to xlen by 1) { output = if ((rs2 >> i) & 1) then output ^ (rs1 << i); else output; }

NONE

clmul produces the lower half of the 2.XLEN carry-less product.

Carry-less multiplication Operations

CLMULH

clmulh rd, rs1, rs2

foreach (i from 1 to xlen by 1) { output = if rs2_val else output }

NONE

clmulh produces the upper half of the 2.XLEN carry-less product.

Carry-less multiplication Operations

CLMULR

clmulr rd, rs1, rs2

foreach (i from 0 to (xlen - 1) by 1) { output = if rs2_val else output }

NONE

clmulr produces bits 2.XLEN-2:XLEN-1 of the 2.XLEN carry-less product.

Carry-less multiplication Operations

3.2.10. RVZbs Single bit Instructions

Name

Format

Pseudocode

Invalid_values

Instruction_exception

Instruction_description

Op Name

BCLR

bclr rd, rs1, rs2

X(rd) = X(rs1) & ~(1 << (X(rs2) & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit cleared at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.

Single_bit_Operations

BCLRI

bclri rd, rs1, shamt

X(rd) = X(rs1) & ~(1 << (shamt & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit cleared at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.

Single_bit_Operations

BEXT

bext rd, rs1, rs2

X(rd) = (X(rs1) >> (X(rs2) & (XLEN - 1))) & 1

NONE

This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.

Single_bit_Operations

BEXTI

bexti rd, rs1, shamt

X(rd) = (X(rs1) >> (shamt & (XLEN - 1))) & 1

NONE

This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.

Single_bit_Operations

BINV

binv rd, rs1, rs2

X(rd) = X(rs1) ^ (1 << (X(rs2) & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit inverted at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.

Single_bit_Operations

BINVI

binvi rd, rs1, shamt

X(rd) = X(rs1) ^ (1 << (shamt & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit inverted at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.

Single_bit_Operations

BSET

bset rd, rs1, rs2

X(rd) = X(rs1) | (1 << (X(rs2) & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit set at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2.

Single_bit_Operations

BSETI

bseti rd, rs1, shamt

X(rd) = X(rs1) | (1 << (shamt & (XLEN - 1)))

NONE

This instruction returns rs1 with a single bit set at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved.

Single_bit_Operations

3.3. Traps, Interrupts, Exceptions

Traps are composed of interrupts and exceptions. Interrupts are asynchronous events whereas exceptions are synchronous ones. On one hand, interrupts are occuring independently of the instructions (mainly raised by peripherals ). On the other hand, an instruction may raise exceptions synchronously.

3.3.1. Raising Traps

When a trap is raised, the behaviour of the CVA6 core depends on several CSRs and some CSRs are modified.

3.3.1.1. Configuration CSRs

CSRs having an effect on the core behaviour when a trap occurs are:

mstatus : several fields control the core behaviour like interrupt enable (MIE )
mtvec : specifies the address of trap handler.

3.3.1.2. Modified CSRs

CSRs (or fields) updated by the core when a trap occurs are:

mstatus : several fields are updated like previous privilege mode (MPP ), previous interrupt enabled (MPIE )
mepc : updated with the address of the interrupted instruction or the instruction raising the exception.
mcause : updated with a code indicating the event causing the trap.

3.3.1.3. Supported exceptions

The following exceptions are supported by the CV32A65X:

instruction address misaligned
- control flow instruction with misaligned target
instruction access fault
- access to PMP region without execute permissions
illegal instruction:
- unimplemented CSRs
- unsupported extensions
breakpoint (EBREAK)
load address misaligned:
- LH at 2n+1 address
- LW at 4n+1, 4n+2, 4n+3 address
load access fault
- access to PMP region without read permissions
store/AMO address misaligned
- SH at 2n+1 address
- SW at 4n+1, 4n+2, 4n+3 address
store/AMO access fault
- access to PMP region without write permissions
environment call (ECALL) from M-mode

3.3.2. Trap return

Trap handler ends with trap return instruction (MRET ). The behaviour of the CV32A65X core depends on several CSRs.

3.3.2.1. Configuration CSRs

CSRs having an effect on the core behaviour when returning from a trap are:

mstatus: several fields control the core behaviour like previous privilege mode (MPP ), previous interrupt enabled (MPIE )

3.3.2.2. Modified CSRs

CSRs (or fields) updated by the core when returning from a trap are:

mstatus: several fields are updated like interrupt enable (MIE ), modify privilege (MPRV)

3.3.3. Interrupts

external interrupt: irq_i signal
timer interrupt: time_irq_i signal

These signals are level sensitive. It means the interrupt is raised until it is cleared.

The exception code field (mcause CSR) depends on the interrupt source.

3.3.4. Wait for Interrupt

CVA6 implementation: WFI stalls the core until an interrupt occurs.

3.4. csr

3.4.1. Conventions

In the subsequent sections, register fields are labeled with one of the following abbreviations:

WPRI (Writes Preserve Values, Reads Ignore Values): read/write field reserved for future use. For forward compatibility, implementations that do not furnish these fields must make them read-only zero.
WLRL (Write/Read Only Legal Values): read/write CSR field that specifies behavior for only a subset of possible bit encodings, with other bit encodings reserved.
WARL (Write Any Values, Reads Legal Values): read/write CSR fields which are only defined for a subset of bit encodings, but allow any value to be written while guaranteeing to return a legal value whenever read.
ROCST (Read-Only Constant): A special case of WARL field which admits only one legal value, and therefore, behaves as a constant field that silently ignores writes.
ROVAR (Read-Only Variable): A special case of WARL field which can take multiple legal values but cannot be modified by software and depends only on the architectural state of the hart.

In particular, a register that is not internally divided into multiple fields can be considered as containing a single field of XLEN bits. This allows to clearly represent read-write registers holding a single legal value (typically zero).

3.4.2. Register Summary

Address Register Name Privilege Description

0x300

MSTATUS

MRW

The mstatus register keeps track of and controls the hart’s current operating state.

0x301

MISA

MRW

misa is a read-write register reporting the ISA supported by the hart.

0x304

MIE

MRW

The mie register is an MXLEN-bit read/write register containing interrupt enable bits.

0x305

MTVEC

MRW

MXLEN-bit read/write register that holds trap vector configuration.

0x310

MSTATUSH

MRW

The mstatush register keeps track of and controls the hart’s current operating state.

0x323-0x33f

MHPMEVENT[3-31]

MRW

The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3.

0x340

MSCRATCH

MRW

The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode.

0x341

MEPC

MRW

The mepc is a warl register that must be able to hold all valid physical and virtual addresses.

0x342

MCAUSE

MRW

The mcause register stores the information regarding the trap.

0x343

MTVAL

MRW

The mtval is a warl register that holds the address of the instruction which caused the exception.

0x344

MIP

MRW

The mip register is an MXLEN-bit read/write register containing information on pending interrupts.

0x3a0-0x3af

PMPCFG[0-15]

MRW

PMP configuration register

0x3b0-0x3ef

PMPADDR[0-63]

MRW

Physical memory protection address register

0x7c0

ICACHE

MRW

the register controls the operation of the i-cache unit.

0x7c1

DCACHE

MRW

the register controls the operation of the d-cache unit.

0xb00

MCYCLE

MRW

Counts the number of clock cycles executed from an arbitrary point in time.

0xb02

MINSTRET

MRW

Counts the number of instructions completed from an arbitrary point in time.

0xb03-0xb1f

MHPMCOUNTER[3-31]

MRW

The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode.

0xb80

MCYCLEH

MRW

upper 32 bits of mcycle

0xb82

MINSTRETH

MRW

Upper 32 bits of minstret.

0xb83-0xb9f

MHPMCOUNTER[3-31]H

MRW

The mhpmcounterh returns the upper half word in RV32I systems.

0xf11

MVENDORID

MRO

32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core.

0xf12

MARCHID

MRO

MXLEN-bit read-only register encoding the base microarchitecture of the hart.

0xf13

MIMPID

MRO

Provides a unique encoding of the version of the processor implementation.

0xf14

MHARTID

MRO

MXLEN-bit read-only register containing the integer ID of the hardware thread running the code.

0xf15

MCONFIGPTR

MRO

MXLEN-bit read-only register that holds the physical address of a configuration data structure.

3.4.3. Register Description

3.4.3.1. MSTATUS

Address: 0x300
Reset Value: 0x00001800
Privilege: MRW
Description: The mstatus register keeps track of and controls the hart’s current operating state.

Bits

Field Name

Reset Value

Type

Legal Values

Description

0

UIE

0x0

ROCST

0x0

Stores the state of the user mode interrupts.

1

SIE

0x0

ROCST

0x0

Stores the state of the supervisor mode interrupts.

2

RESERVED_2

0x0

WPRI

Reserved

3

MIE

0x0

WLRL

0x0 - 0x1

Stores the state of the machine mode interrupts.

4

UPIE

0x0

ROCST

0x0

Stores the state of the user mode interrupts prior to the trap.

5

SPIE

0x0

ROCST

0x0

Stores the state of the supervisor mode interrupts prior to the trap.

6

UBE

0x0

ROCST

0x0

control the endianness of memory accesses other than instruction fetches for user mode

7

MPIE

0x0

WLRL

0x0 - 0x1

Stores the state of the machine mode interrupts prior to the trap.

8

SPP

0x0

ROCST

0x0

Stores the previous priority mode for supervisor.

[10:9]

RESERVED_9

0x0

WPRI

Reserved

[12:11]

MPP

0x3

WARL

0x3

Stores the previous priority mode for machine.

[14:13]

FS

0x0

ROCST

0x0

Encodes the status of the floating-point unit, including the CSR fcsr and floating-point data registers.

[16:15]

XS

0x0

ROCST

0x0

Encodes the status of additional user-mode extensions and associated state.

17

MPRV

0x0

ROCST

0x0

Modifies the privilege level at which loads and stores execute in all privilege modes.

18

SUM

0x0

ROCST

0x0

Modifies the privilege with which S-mode loads and stores access virtual memory.

19

MXR

0x0

ROCST

0x0

Modifies the privilege with which loads access virtual memory.

20

TVM

0x0

ROCST

0x0

Supports intercepting supervisor virtual-memory management operations.

21

TW

0x0

ROCST

0x0

Supports intercepting the WFI instruction.

22

TSR

0x0

ROCST

0x0

Supports intercepting the supervisor exception return instruction.

23

SPELP

0x0

ROCST

0x0

Supervisor mode previous expected-landing-pad (ELP) state.

[30:24]

RESERVED_24

0x0

WPRI

Reserved

31

SD

0x0

ROCST

0x0

Read-only bit that summarizes whether either the FS field or XS field signals the presence of some dirty state.

3.4.3.2. MISA

Address: 0x301
Reset Value: 0x40001106
Privilege: MRW
Description: misa is a read-write register reporting the ISA supported by the hart.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[25:0]

EXTENSIONS

0x1106

ROCST

0x1106

Encodes the presence of the standard extensions, with a single bit per letter of the alphabet.

[29:26]

RESERVED_26

0x0

WPRI

Reserved

[31:30]

MXL

0x1

WARL

0x1

Encodes the native base integer ISA width.

3.4.3.3. MIE

Address: 0x304
Reset Value: 0x00000000
Privilege: MRW
Description: The mie register is an MXLEN-bit read/write register containing interrupt enable bits.

Bits

Field Name

Reset Value

Type

Legal Values

Description

0

USIE

0x0

ROCST

0x0

User Software Interrupt enable.

1

SSIE

0x0

ROCST

0x0

Supervisor Software Interrupt enable.

2

VSSIE

0x0

ROCST

0x0

VS-level Software Interrupt enable.

3

MSIE

0x0

ROCST

0x0

Machine Software Interrupt enable.

4

UTIE

0x0

ROCST

0x0

User Timer Interrupt enable.

5

STIE

0x0

ROCST

0x0

Supervisor Timer Interrupt enable.

6

VSTIE

0x0

ROCST

0x0

VS-level Timer Interrupt enable.

7

MTIE

0x0

WLRL

0x0 - 0x1

Machine Timer Interrupt enable.

8

UEIE

0x0

ROCST

0x0

User External Interrupt enable.

9

SEIE

0x0

ROCST

0x0

Supervisor External Interrupt enable.

10

VSEIE

0x0

ROCST

0x0

VS-level External Interrupt enable.

11

MEIE

0x0

WLRL

0x0 - 0x1

Machine External Interrupt enable.

12

SGEIE

0x0

ROCST

0x0

HS-level External Interrupt enable.

[31:13]

RESERVED_13

0x0

WPRI

Reserved

3.4.3.4. MTVEC

Address: 0x305
Reset Value: 0x80010000
Privilege: MRW
Description: MXLEN-bit read/write register that holds trap vector configuration.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[1:0]

MODE

0x0

WARL

0x0

Vector mode.

[31:2]

BASE

0x20004000

WARL

0x00000000 - 0x3FFFFFFF

Vector base address.

3.4.3.5. MSTATUSH

Address: 0x310
Reset Value: 0x00000000
Privilege: MRW
Description: The mstatush register keeps track of and controls the hart’s current operating state.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[3:0]

RESERVED_0

0x0

WPRI

Reserved

4

SBE

0x0

ROCST

0x0

control the endianness of memory accesses other than instruction fetches for supervisor mode

5

MBE

0x0

ROCST

0x0

control the endianness of memory accesses other than instruction fetches for machine mode

6

GVA

0x0

ROCST

0x0

Stores the state of the supervisor mode interrupts.

7

MPV

0x0

ROCST

0x0

Stores the state of the user mode interrupts.

8

RESERVED_8

0x0

WPRI

Reserved

9

MPELP

0x0

ROCST

0x0

Machine mode previous expected-landing-pad (ELP) state.

[31:10]

RESERVED_10

0x0

WPRI

Reserved

3.4.3.6. MHPMEVENT[3-31]

Address: 0x323-0x33f
Reset Value: 0x00000000
Privilege: MRW
Description: The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MHPMEVENT[I]

0x00000000

ROCST

0x0

The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3.

3.4.3.7. MSCRATCH

Address: 0x340
Reset Value: 0x00000000
Privilege: MRW
Description: The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MSCRATCH

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode.

3.4.3.8. MEPC

Address: 0x341
Reset Value: 0x00000000
Privilege: MRW
Description: The mepc is a warl register that must be able to hold all valid physical and virtual addresses.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MEPC

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

The mepc is a warl register that must be able to hold all valid physical and virtual addresses.

3.4.3.9. MCAUSE

Address: 0x342
Reset Value: 0x00000000
Privilege: MRW
Description: The mcause register stores the information regarding the trap.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[30:0]

EXCEPTION_CODE

0x0

WLRL

0x0 - 0x8, 0xb

Encodes the exception code.

31

INTERRUPT

0x0

WLRL

0x0 - 0x1

Indicates whether the trap was due to an interrupt.

3.4.3.10. MTVAL

Address: 0x343
Reset Value: 0x00000000
Privilege: MRW
Description: The mtval is a warl register that holds the address of the instruction which caused the exception.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MTVAL

0x00000000

ROCST

0x0

The mtval is a warl register that holds the address of the instruction which caused the exception.

3.4.3.11. MIP

Address: 0x344
Reset Value: 0x00000000
Privilege: MRW
Description: The mip register is an MXLEN-bit read/write register containing information on pending interrupts.

Bits

Field Name

Reset Value

Type

Legal Values

Description

0

USIP

0x0

ROCST

0x0

User Software Interrupt Pending.

1

SSIP

0x0

ROCST

0x0

Supervisor Software Interrupt Pending.

2

VSSIP

0x0

ROCST

0x0

VS-level Software Interrupt Pending.

3

MSIP

0x0

ROCST

0x0

Machine Software Interrupt Pending.

4

UTIP

0x0

ROCST

0x0

User Timer Interrupt Pending.

5

STIP

0x0

ROCST

0x0

Supervisor Timer Interrupt Pending.

6

VSTIP

0x0

ROCST

0x0

VS-level Timer Interrupt Pending.

7

MTIP

0x0

ROVAR

0x0 - 0x1

Machine Timer Interrupt Pending.

8

UEIP

0x0

ROCST

0x0

User External Interrupt Pending.

9

SEIP

0x0

ROCST

0x0

Supervisor External Interrupt Pending.

10

VSEIP

0x0

ROCST

0x0

VS-level External Interrupt Pending.

11

MEIP

0x0

ROVAR

0x0 - 0x1

Machine External Interrupt Pending.

12

SGEIP

0x0

ROCST

0x0

HS-level External Interrupt Pending.

[31:13]

RESERVED_13

0x0

WPRI

Reserved

3.4.3.12. PMPCFG[0-15]

Address: 0x3a0-0x3af
Reset Value: 0x00000000
Privilege: MRW
Description: PMP configuration register

Bits

Field Name

Reset Value

Type

Legal Values

Description

[7:0]

PMP[I*4 +0]CFG

0x0

ROCST

0x0

pmp configuration bits

[15:8]

PMP[I*4 +1]CFG

0x0

ROCST

0x0

pmp configuration bits

[23:16]

PMP[I*4 +2]CFG

0x0

ROCST

0x0

pmp configuration bits

[31:24]

PMP[I*4 +3]CFG

0x0

ROCST

0x0

pmp configuration bits

3.4.3.13. PMPADDR[0-63]

Address: 0x3b0-0x3ef
Reset Value: 0x00000000
Privilege: MRW
Description: Physical memory protection address register

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

PMPADDR[I]

0x00000000

ROCST

0x0

Physical memory protection address register

3.4.3.14. ICACHE

Address: 0x7c0
Reset Value: 0x00000001
Privilege: MRW
Description: the register controls the operation of the i-cache unit.

Bits

Field Name

Reset Value

Type

Legal Values

Description

0

ICACHE

0x1

RW

0x1

bit for cache-enable of instruction cache

[31:1]

RESERVED_1

0x0

WPRI

Reserved

3.4.3.15. DCACHE

Address: 0x7c1
Reset Value: 0x00000001
Privilege: MRW
Description: the register controls the operation of the d-cache unit.

Bits

Field Name

Reset Value

Type

Legal Values

Description

0

DCACHE

0x1

RW

0x1

bit for cache-enable of data cache

[31:1]

RESERVED_1

0x0

WPRI

Reserved

3.4.3.16. MCYCLE

Address: 0xb00
Reset Value: 0x00000000
Privilege: MRW
Description: Counts the number of clock cycles executed from an arbitrary point in time.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MCYCLE

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

Counts the number of clock cycles executed from an arbitrary point in time.

3.4.3.17. MINSTRET

Address: 0xb02
Reset Value: 0x00000000
Privilege: MRW
Description: Counts the number of instructions completed from an arbitrary point in time.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MINSTRET

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

Counts the number of instructions completed from an arbitrary point in time.

3.4.3.18. MHPMCOUNTER[3-31]

Address: 0xb03-0xb1f
Reset Value: 0x00000000
Privilege: MRW
Description: The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MHPMCOUNTER[I]

0x00000000

ROCST

0x0

The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode.

3.4.3.19. MCYCLEH

Address: 0xb80
Reset Value: 0x00000000
Privilege: MRW
Description: upper 32 bits of mcycle

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MCYCLEH

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

upper 32 bits of mcycle

3.4.3.20. MINSTRETH

Address: 0xb82
Reset Value: 0x00000000
Privilege: MRW
Description: Upper 32 bits of minstret.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MINSTRETH

0x00000000

WARL

0x00000000 - 0xFFFFFFFF

Upper 32 bits of minstret.

3.4.3.21. MHPMCOUNTER[3-31]H

Address: 0xb83-0xb9f
Reset Value: 0x00000000
Privilege: MRW
Description: The mhpmcounterh returns the upper half word in RV32I systems.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MHPMCOUNTER[I]H

0x00000000

ROCST

0x0

The mhpmcounterh returns the upper half word in RV32I systems.

3.4.3.22. MVENDORID

Address: 0xf11
Reset Value: 0x00000602
Privilege: MRO
Description: 32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MVENDORID

0x00000602

ROCST

0x602

32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core.

3.4.3.23. MARCHID

Address: 0xf12
Reset Value: 0x00000003
Privilege: MRO
Description: MXLEN-bit read-only register encoding the base microarchitecture of the hart.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MARCHID

0x00000003

ROCST

0x3

MXLEN-bit read-only register encoding the base microarchitecture of the hart.

3.4.3.24. MIMPID

Address: 0xf13
Reset Value: 0x00000000
Privilege: MRO
Description: Provides a unique encoding of the version of the processor implementation.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MIMPID

0x00000000

ROCST

0x0

Provides a unique encoding of the version of the processor implementation.

3.4.3.25. MHARTID

Address: 0xf14
Reset Value: 0x00000000
Privilege: MRO
Description: MXLEN-bit read-only register containing the integer ID of the hardware thread running the code.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MHARTID

0x00000000

ROCST

0x0

MXLEN-bit read-only register containing the integer ID of the hardware thread running the code.

3.4.3.26. MCONFIGPTR

Address: 0xf15
Reset Value: 0x00000000
Privilege: MRO
Description: MXLEN-bit read-only register that holds the physical address of a configuration data structure.

Bits

Field Name

Reset Value

Type

Legal Values

Description

[31:0]

MCONFIGPTR

0x00000000

ROCST

0x0

MXLEN-bit read-only register that holds the physical address of a configuration data structure.

3.5. OBI Bus interface

OBI Bus aims to connected CVA6 to the memory.

CVA6 implements 3 interfaces, one for fetch instruction and two interfaces for data handling, one dedicated to loading operations and the other to storing operations.

Each connection uses a bus interface that complies with the OBI 1.6 standard. A detailed explanation of the optional fields described by this standard is provided in the table below.

In order to understand how the OBI memory interface behaves in CVA6, it is necessary to read the OBI Protocol Specification (https://github.com/openhwgroup/obi/) and this chapter.

3.5.1. Protocol Description

OBI is a point-to-point protocol (between manager and subordinate)

An OBI link consists of two channels:

The address channel, called the A channel.
The response channel, called the R channel

OBI transaction consists of two transfers:

An address phase transfer over the A channel.
A response phase transfer over the R channel.

The address phase transfer is as follows:

The manager indicates the validity of its address phase signals (i.e. addr, wdata, we, be, auser, aid, mid, wuser, prot, memtype, dbg, atop, achk) by settings its request (req) high.
The subordinate indicates its readiness to accept the address phase signals by setting grant (gnt) high.
The address phase of a transaction starts in the cycle in which req goes high and completes on the rising clk edge when both req and gnt are high.

The response phase transfer is as follows:

After a granted request (in the A channel), the subordinate indicates the validity of its response phase signals (i.e. rdata, err, ruser, rid, exokay, rchk) by setting rvalid high.
The manager indicates its readiness to accept the response phase signals by setting rready high.
The response phase of a transaction starts in the cycle in which rvalid goes high and completes on the rising clk edge when both rvalid and rready are high.

3.5.2. CVA6 implementation

All OBI buses for FETCH, LOAD, and STORE operations have a data size of 32 bits.

The Fetch interface performs only full BE accesses.

The Fetch and Load interfaces are only capable of read accesses.

The Store interface is exclusively for write accesses.

CVA6 is always ready to accept a response on channel R, so the rrready signal is always high.

The Fetch and Load interfaces do not generate outstanding accesses.

The Store interface does not use channel R. Therefore, the depth of outstanding accesses depends on the subordinate.

The error flag is not used at all.

Parity signal support is partial. Parity signals reqpar and rreadypar are generated as specified by the standard, but gntpar and rvalidpar are currently not checked by CVA6.

Checksum signals achk and rchk are not supported.

All user data signals (auser, wuser, ruser) are not used.

CVA6 does not support atomic operations, so exokay and atom are not used.

CVA6 does not have a Physical Memory Attribution (PMA), so cacheable or bufferable regions are not defined. As a consequence, memtype always 0.

prot[0] is driven to indicate the type of access: FETCH for the Fetch interface and DATA for the LOAD/STORE interfaces.

As only Machine mode is available in CVA6, prot[2:1] is statically at b'11' (Machine Mode).

The master identifier "Mid" is set to 0.

The transaction identifiers aid and rid signals are only 1-bit size. Even though there are no outstanding reads available, rid must be driven by the subordinate in compliance with the standard (rid = aid).

3.5.3. Port list channel A

Fetch

Load

Store

req

OUTPUT

used

Address transfer request. req=1 signals the availability of valid address phase signals.

gnt

INPUT

used

Grant. Ready to accept address transfer. Address transfer is accepted on rising clk with req=1 and gnt=1.

addr[]

OUTPUT

used

Address

we

OUTPUT

used

Write Enable, high for writes, low for reads.

be[3:0]

OUTPUT

used (always Full)

used

Byte Enable. Is set for the bytes to write/read.

wdata[31:0]

OUTPUT

0 (default)

used

Write data. Only valid for write transactions. Undefined for read transactions.

auser[0:0]

OUTPUT

0 (default)

Address Phase User signals. Valid for both read and write transactions.

wuser[0:0]

OUTPUT

0 (default)

Additional Address Phase User signals. Only valid for write transactions. Undefined for read transactions.

aid[0:0]

OUTPUT

0 (default)

Address Phase transaction identifier.

atop[5:0]

OUTPUT

0 (default)

Atomic Operation.

memtype[0]

OUTPUT

0 (default)

Memory type attributes

memtype[1]

OUTPUT

0, no PMA

Memory type attributes, cacheable flag.

mid[0:0]

OUTPUT

0 (default)

Manager ID.

prot[0]

OUTPUT

FETCH only

DATA only

Protection attributes. Kind of access : 0 is FETCH, 1 is DATA

prot[2:1]

OUTPUT

2’b11

Protection attributes, Privilege Mode: 2’b11 is machine mode.

dbg

OUTPUT

0 (default)

Debug access.

reqpar

OUTPUT

used

Parity bit for req signal (odd parity).

gntpar

INPUT

not used

Parity bit for gnt signal (odd parity).

achk[0:0]

OUTPUT

tied at zero

Checksum for address phase signals (except achk itself).

3.5.4. Port list channel R

Fetch

Load

Store

rvalid

INPUT

used

Response transfer request. rvalid=1 signals the availability of valid response phase signals. Used for both reads and writes.

rready

OUTPUT

used

Ready to accept response transfer. Response transfer is accepted on rising clk with rvalid=1 and rready=1.

rdata[31:0]

INPUT

used

not used

Read data. Only valid for read transactions. Undefined for write transactions.

err

INPUT

not used

Error.

ruser[0:0]

INPUT

not used

Response phase User signals. Only valid for read transactions. Undefined for write transactions.

rid[0:0]

INPUT

used

Response Phase transaction identifier.

exokay

INPUT

not used

Exclusive transaction okay.

rvalidpar

INPUT

not used

Parity bit for rvalid signal (odd parity).

rreadypar

OUTPUT

used

Parity bit for rready signal (odd parity).

rchk[0:0]

INPUT

not used

Checksum for address phase signals (except rchk itself).

3.6. CV-X-IF Interface and Coprocessor

The CV-X-IF interface of CVA6 allows to extend its supported instruction set with external coprocessors.

Applicability of this chapter to configurations:

Configuration

Implementation

CV32A60AX

CV-X-IF included

CV32A60X

CV-X-IF included

CV64A6_MMU

CV-X-IF included

3.6.1. CV-X-IF interface specification

3.6.1.1. Description

This design specification presents global functionalities of Core-V-eXtension-Interface (XIF, CVXIF, CV-X-IF, X-interface) in the CVA6 core. CVA6 implements version v1.0.0: Ratified Release.

The CORE-V X-Interface is a RISC-V eXtension interface that provides a
generalized framework suitable to implement custom coprocessors and ISA
extensions for existing RISC-V processors.

Refer to https://github.com/openhwgroup/core-v-xif GitHub repository

The specification of the CV-X-IF bus protocol can be found at [CV-X-IF].

CV-X-IF aims to:

Create interfaces to connect a coprocessor to the CVA6 to execute instructions.
Offload CVA6 illegal instrutions to the coprocessor to be executed.
Get the results of offloaded instructions from the coprocessor so they are written back into the CVA6 register file.
Add standard RISC-V instructions unsupported by CVA6 or custom instructions and implement them in a coprocessor.
Kill offloaded instructions to allow speculative execution in the coprocessor. (Unsupported in CVA6 yet)
Connect the coprocessor to memory via the CVA6 Load and Store Unit. (Unsupported in CVA6 yet)

The coprocessor operates like another functional unit so it is connected to the CVA6 in the execute stage.

Only the 5 mandatory interfaces from the CV-X-IF specification (compressed, issue, register, commit and result ) have been implemented. Memory Interface and Memory result interface are optional interfaces, they are not implemented in the CVA6.

3.6.1.2. Supported Parameters

The following table presents CVXIF parameters supported by CVA6.

Signal Value Description

X_NUM_RS

int: 2 or 3 (configurable)

CV32A65X: 2

Number of register file read ports that can
be used by the eXtension interface

X_ID_WIDTH

int: 1 to 32

CV32A65X: 3

Identification width for the eXtension
interface

X_MEM_WIDTH

n/a (feature not supported)

Memory access width for loads/stores via the
eXtension interface

X_RFR_WIDTH

int: XLEN (32 or 64)

CV32A65X: 32

Register file read access width for the
eXtension interface

X_RFW_WIDTH

int: XLEN (32 or 64)

CV32A65X: 32

Register file write access width for the
eXtension interface

X_MISA

logic[31:0]: 0x0000_0000

MISA extensions implemented on the eXtension
interface

3.6.1.3. CV-X-IF Enabling

CV-X-IF can be enabled or disabled via the CVA6ConfigCvxifEn parameter in the SystemVerilog source code.

Never let the CV-X-IF interface unconnected with the CVA6ConfigCvxifEn parameter enabled.

3.6.1.4. Illegal instruction decoding

The CVA6 decoder module detects illegal instructions for the CVA6, prepares exception field with relevant information (exception code "ILLEGAL INSTRUCTION", instruction value).

The exception valid flag is raised in CVA6 decoder when CV-X-IF is disabled. Otherwise it is not raised at this stage because the decision belongs to the coprocessor after the offload process.

3.6.1.5. RS3 support

The number of source registers used by the CV-X-IF coprocessor is configurable with 2 or 3 source registers.

If CV-X-IF is enabled and configured with 3 source registers, a third read port is added to the CVA6 general purpose register file.

3.6.1.6. Description of interface connections between CVA6 and Coprocessor

In CVA6 execute stage, there is a new functional unit dedicated to drive the CV-X-IF interfaces. Here is how and to what CV-X-IF interfaces are connected to the CVA6.

Compressed interface:
- Request:
  - Undecoded illegal instruction is connected to compressed_req.instruction
  - Illegal compressed signal calculated from all compressed decoders (RVC, ZCMP, ZCMT) is connected to compressed_req.valid.
- Response:
  - if compressed_resp.accept is set during a transaction (i.e. compressed_valid and compressed_ready are set), the offloaded compressed instruction is accepted by the coprocessor and compressed_resp.instruction contains its 32 bits equivalent for the coprocessor and sent to the 32 bits decoder.
  - if compressed_resp.accept is not set during a transaction, the instruction is an illegal compressed instruction and will follow its path in the decode stage. Typically, it will try to be decoded as a 32 bits instruction.
Issue and register interface (Coupled):
- Request:
  - Operands are connected to register.rs signals
  - Scoreboard transaction id is connected to issue_req.id signal. Therefore scoreboard ids and offloaded instruction ids are linked together (equal in this implementation). It allows the CVA6 to do out of order execution with the coprocessor in the same way as other functional units.
  - Undecoded instruction is connected to issue_req.instruction
  - Valid signal for CVXIF functional unit is connected to issue_req.valid
  - Each register.rs_valid signal is set when its associated stall signal is not set.
  - register_valid is connected to issue_valid because Issue and Register interface are coupled. (X_ISSUE_REGISTER_SPLIT == 0)
- Response:
  - If issue_resp.accept is set during a transaction (i.e. issue valid and ready are set), the offloaded instruction is accepted by the coprocessor and a result transaction will happen.
  - If issue_resp.accept is not set during a transaction, the offloaded instruction is illegal and an illegal instruction exception will be raised as soon as no result transaction are written on the writeback bus.
Commit interface:
- Valid signal of commit interface is connected to the valid signal of issue interface.
- Id signal of commit interface is connected to issue interface id signal (i.e. scoreboard id).
- Killing of offload instruction is never set. (Unsupported feature)
- Therefore all accepted offloaded instructions are commited to their execution and no killing of instruction is possible in this implementation.
Result interface:
- Request:
  - Ready signal of result interface is always set as CVA6 is always ready to take a result from coprocessor for an accepted offloaded instruction.
- Response:
  - Result response is directly connected to writeback bus of the CV-X-IF functionnal unit.
  - Valid signal of result interface is connected to valid signal of writeback bus.
  - Id signal of result interface is connected to scoreboard id of writeback bus.
  - Write enable signal of result interface is connected to a dedicated CV-X-IF WE signal in CVA6 which signals scoreboard if a writeback should happen or not to the CVA6 register file.
  - exccode and exc signal of result interface are connected to exception signals of writeback bus. Exception from coprocessor does not write the tval field in exception signal of writeback bus.
  - Three registers are added to hold illegal instruction information in case a result transaction and a non-accepted issue transaction happen in the same cycle. Result transactions will be written to the writeback bus in this case having priority over the non-accepted instruction due to being linked to an older offloaded instruction. Once the writeback bus is free, an illegal instruction exception will be raised thanks to information held in these three registers.

3.6.2. Coprocessor recommendations for use with CVA6’s CV-X-IF

CVA6 supports all coprocessors supporting the CV-X-IF specification with the exception of :

Coprocessor requiring the Memory interface and Memory result interface (not implemented in CVA6 yet).
- All memory transaction should happen via the Issue interface, i.e. Load into CVA6 register file then initialize an issue transaction.
Stateful coprocessors.
- CVA6 will commit on the Commit interface all its issue transactions. Speculation informations are only kept in the CVA6 and speculation process is only done in CVA6. The coprocessor shall be stateless otherwise it will not be able to revert its state if CVA6 kills an in-flight instruction (in case of mispredict or flush).

3.6.3. How to use CVA6 without CV-X-IF interface

Select a configuration with CVA6ConfigCvxifEn parameter disabled or change it for your configuration.

Never let the CV-X-IF interface unconnected with the CVA6ConfigCvxifEn parameter enabled.

3.6.4. How to design a coprocessor for the CV-X-IF interface

The team is looking for a contributor to write this section.

3.6.5. How to program a CV-X-IF coprocessor

The team is looking for a contributor to write this section.

4. Architecture and Modules

The CV32A65X is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.

For ASIC synthesis, the whole design is completely synchronous and uses positive-edge triggered flip-flops. The core occupies an area of about 80 kGE. The clock frequency can be more than 1 GHz depending of technology.

The CV32A65X subsystem is composed of 8 modules.

CV32A65X modules

Connections between modules are illustrated in the following block diagram. FRONTEND, DECODE, ISSUE, EXECUTE, COMMIT and CONTROLLER are part of the pipeline. The CSRFILE contains the registers.

CV32A65X pipeline and modules

4.1. FRONTEND Module

4.1.1. Description

The FRONTEND module implements two first stages of the CVA6 pipeline, PC gen and Fetch stages.

PC gen stage is responsible for generating the next program counter. It hosts a Branch Target Buffer (BTB), a Branch History Table (BHT) and a Return Address Stack (RAS) to speculate on control flow instructions.

Fetch stage requests data to the CACHE module, realigns the data to store them in instruction queue and transmits the instructions to the DECODE module. FRONTEND can fetch up to 4 instructions per cycle, but DECODE module decodes up to 2 instruction(s) per cycle.

The module is connected to:

CACHES module provides fethed instructions to FRONTEND.
DECODE module receives instructions from FRONTEND.
CONTROLLER module can order to flush and to halt FRONTEND PC gen stage
EXECUTE, CONTROLLER, CSR and COMMIT modules trigger PC jumping due to a branch misprediction, an exception, a return from an exception, a debug entry or a pipeline flush. They provides the PC next value.
CSR module states about debug mode.

Table 3. **frontend module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`boot_addr_i`	in	Next PC when reset	SUBSYSTEM	logic[CVA6Cfg.VLEN-1:0]
`flush_i`	in	Flush requested by FENCE, mis-predict and exception	CONTROLLER	logic
`halt_i`	in	Halt requested by WFI and Accelerate port	CONTROLLER	logic
`set_pc_commit_i`	in	Set COMMIT PC as next PC requested by FENCE, CSR side-effect and Accelerate port	CONTROLLER	logic
`pc_commit_i`	in	COMMIT PC	COMMIT	logic[CVA6Cfg.VLEN-1:0]
`ex_valid_i`	in	Exception event	COMMIT	logic
`resolved_branch_i`	in	Mispredict event and next PC	EXECUTE	bp_resolve_t
`eret_i`	in	Return from exception event	CSR	logic
`epc_i`	in	Next PC when returning from exception	CSR	logic[CVA6Cfg.VLEN-1:0]
`trap_vector_base_i`	in	Next PC when jumping into exception	CSR	logic[CVA6Cfg.VLEN-1:0]
`arsp_i`	in	address translation request chanel	EXECUTE	fetch_arsp_t
`areq_o`	out	address translation response chanel	EXECUTE	fetch_areq_t
`obi_fetch_req_o`	out	OBI Fetch Request channel	CACHES	obi_fetch_req_t
`obi_fetch_rsp_i`	in	OBI Fetch Response channel	CACHES	obi_fetch_rsp_t
`fetch_entry_o`	out	Handshake’s data between fetch and decode	ID_STAGE	fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_valid_o`	out	Handshake’s valid between fetch and decode	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_ready_i`	in	Handshake’s ready between fetch and decode	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

For any HW configuration,

flush_bp_i input is tied to 0

As DebugEn = False,

set_debug_pc_i input is tied to 0
debug_mode_i input is tied to 0

As PipelineOnly = True,

fetch_req_o output is tied to 0
fetch_rsp_i input is tied to 0

4.1.2. Functionality

4.1.3. PC Generation stage

PC gen generates the next program counter. The next PC can originate from the following sources (listed in order of precedence):

Reset state: At reset, the PC is assigned to the boot address.
Branch Prediction: The fetched instruction is predecoded by the instr_scan submodule. When the instruction is a control flow, three cases are considered:
1. When the instruction is a JALR which corresponds to a return (rs1 = x1 or rs1 = x5). RAS provides next PC as a prediction.
2. When the instruction is a JALR which does not correspond to areturn. If BTB (Branch Target Buffer) returns a valid address, then BTBpredicts next PC. Else JALR is not considered as a control flow instruction, which will generate a mispredict.
3. When the instruction is a conditional branch. If BHT (Branch History table) returns a valid address, then BHT predicts next PC. Else the prediction depends on the PC relative jump offset sign: if sign is negative the prediction is taken, otherwise the prediction is not taken.
Then the PC gen informs the Fetch stage that it performed a prediction on the PC.
Default: The next 64-bit block is fetched. PC Gen fetches word boundary 64-bits block from CACHES module. And the fetch stage identifies the instructions from the 64-bits blocks.
Mispredict: Misprediction are feedbacked by EX_STAGE module. In any case we need to correct our action and start fetching from the correct address.
Replay instruction fetch: When the instruction queue is full, the instr_queue submodule asks the fetch replay and provides the address to be replayed.
Return from environment call: When CSR requests a return from an environment call, next PC takes the value of the PC of the instruction after the one pointed to by the mepc CSR.
Exception/Interrupt: If an exception is triggered by CSR_REGISTER, next PC takes the value of the trap vector base address CSR.
Pipeline starting fetching from COMMIT PC: When the commit stage is halted by a WFI instruction or when the pipeline has been flushed due to CSR change, next PC takes the value of the PC coming from the COMMIT submodule. As CSR instructions do not exist in a compressed form, PC is unconditionally incremented by 4.

All program counters are logical addressed.

4.1.4. Fetch Stage

Fetch stage controls the CACHE module by a handshaking protocol. Fetched data is a 64-bit block with a word-aligned address. A granted fetch is processed by the instr_realign submodule to produce instructions. Then instructions are pushed into an internal instruction FIFO called instruction queue (instr_queue submodule). This submodule stores the instructions with its associated address and sends them to the DECODE module.

Before sending the instructions to the DECODE stage, the frontend calculates a prediction address in case of a JUMP or Branch. This predicted address is sent to the DECODE stage along with the instruction and its fetch address. The prediction address is not valid if there is no prediction. Instructions following a predicted taken control flow instruction are dropped.

Memory can feedback potential exceptions which can be bus errors, invalid accesses or instruction page faults. The FRONTEND transmits the exception from CACHES to DECODE.

4.1.5. Submodules

FRONTEND submodules

FRONTEND submodule interconnections

4.1.5.1. Instr_realign submodule

The 64-bit aligned block coming from the CACHE module enters the instr_realign submodule. This submodule extracts the instructions from the 64-bit blocks. Based on the fetch address and the fetched data, the instr_realign module extracts the valid instructions to be sent to the queue. It is possible to fetch up to 4 instructions per cycle when C extension is used. A not-compressed instruction can be misaligned on the block size, interleaved with two cache blocks. In that case, two cache accesses are needed to get the whole instruction. The instr_realign submodule provides up to 4 instructions per cycle when compressed extension is enabled, else one instruction per cycle. Incomplete instruction is stored in instr_realign submodule until its second half is fetched.

Below is a table that explains how the instr_realign works:

C: compressed instruction
I: not compressed instruction
UI: Incomplete instruction stored in the instr_realign

The Instr_realign can be flushed when the frontend requests the cache to kill the incoming instruction, in this case the incomplete instruction is deleted.

Table 4. **instr_realign module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Fetch flush request	CONTROLLER	logic
`valid_i`	in	32-bit block is valid	CACHE	logic
`serving_unaligned_o`	out	Instruction is unaligned	FRONTEND	logic
`address_i`	in	32-bit block address	CACHE	logic[CVA6Cfg.VLEN-1:0]
`data_i`	in	32-bit block	CACHE	logic[CVA6Cfg.FETCH_WIDTH-1:0]
`valid_o`	out	instruction is valid	FRONTEND	logic[CVA6Cfg.INSTR_PER_FETCH-1:0]
`addr_o`	out	Instruction address	FRONTEND	logic[CVA6Cfg.INSTR_PER_FETCH-1:0][CVA6Cfg.VLEN-1:0]
`instr_o`	out	Instruction	instr_scan&instr_queue	logic[CVA6Cfg.INSTR_PER_FETCH-1:0][31:0]

4.1.5.2. Instr_queue submodule

The instr_queue receives mutliple instructions from instr_realign submodule to create a valid stream to be executed. Frontend pushes instructions and all related information into the FIFO for storage, including details needed in case of a misprediction or exception: the instructions themselves, instruction control flow type, exception, exception address, and predicted address. DECODE pops them when decode stage is ready and indicates to the FRONTEND the instruction has been consummed.

The instruction queue contains two FIFOs: one for instructions and one for addresses, which stores addresses in case of a prediction. The instruction FIFO can hold up to 4×4 instructions, while the address FIFO can hold up to 2 addresses. If the instruction FIFO is full, a replay request is sent to inform the fetch mechanism to replay the fetch. If the address FIFO is full and there is a prediction, a replay request is sent to inform the fetch mechanism to replay the fetch, even if the instruction FIFO is not full.

The instruction queue can be flushed by the flush signal coming from the CONTROLLER.

Table 5. **instr_queue module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Fetch flush request	CONTROLLER	logic
`instr_i`	in	Instruction	instr_realign	logic[CVA6Cfg.INSTR_PER_FETCH-1:0][31:0]
`addr_i`	in	Instruction address	instr_realign	logic[CVA6Cfg.INSTR_PER_FETCH-1:0][CVA6Cfg.VLEN-1:0]
`valid_i`	in	Instruction is valid	instr_realign	logic[CVA6Cfg.INSTR_PER_FETCH-1:0]
`ready_o`	out	Handshake’s ready with CACHE	CACHE	logic
`consumed_o`	out	Indicates instructions consummed, or popped by ID_STAGE	FRONTEND	logic[CVA6Cfg.INSTR_PER_FETCH-1:0]
`exception_i`	in	Exception (which is page-table fault)	CACHE	ariane_pkg::frontend_exception_t
`exception_addr_i`	in	Exception address	CACHE	logic[CVA6Cfg.VLEN-1:0]
`predict_address_i`	in	Branch predict	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`cf_type_i`	in	Instruction predict address	FRONTEND	ariane_pkg::cf_t[CVA6Cfg.INSTR_PER_FETCH-1:0]
`replay_o`	out	Replay instruction because one of the FIFO was full	FRONTEND	logic
`replay_addr_o`	out	Address at which to replay the fetch	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`fetch_entry_o`	out	Handshake’s data with ID_STAGE	ID_STAGE	fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_valid_o`	out	Handshake’s valid with ID_STAGE	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_ready_i`	in	Handshake’s ready with ID_STAGE	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVH = False,

exception_gpaddr_i input is tied to 0
exception_tinst_i input is tied to 0
exception_gva_i input is tied to 0

4.1.5.3. Instr_scan submodule

As compressed extension is enabled, 4 instr_scan are instantiated to handle up to 4 instructions per cycle.

Each instr_scan submodule pre-decodes the fetched instructions coming from the instr_realign module, also calculate the immediate, instructions could be compressed or not. The instr_scan submodule is a flow controller which provides the intruction type: branch, jump, return, jalr, imm, call or others. These outputs are used by the branch prediction feature.

Table 6. **instr_scan module** IO ports
Signal	IO	Description	connexion	Type
`instr_i`	in	Instruction to be predecoded	instr_realign	logic[31:0]
`rvi_return_o`	out	Return instruction	FRONTEND	logic
`rvi_call_o`	out	JAL instruction	FRONTEND	logic
`rvi_branch_o`	out	Branch instruction	FRONTEND	logic
`rvi_jalr_o`	out	JALR instruction	FRONTEND	logic
`rvi_jump_o`	out	Unconditional jump instruction	FRONTEND	logic
`rvi_imm_o`	out	Instruction immediate	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`rvc_branch_o`	out	Branch compressed instruction	FRONTEND	logic
`rvc_jump_o`	out	Unconditional jump compressed instruction	FRONTEND	logic
`rvc_jr_o`	out	JR compressed instruction	FRONTEND	logic
`rvc_return_o`	out	Return compressed instruction	FRONTEND	logic
`rvc_jalr_o`	out	JALR compressed instruction	FRONTEND	logic
`rvc_call_o`	out	JAL compressed instruction	FRONTEND	logic
`rvc_imm_o`	out	Instruction compressed immediate	FRONTEND	logic[CVA6Cfg.VLEN-1:0]

4.1.5.4. BHT (Branch History Table) submodule

The BHT is implemented as a memory which is composed of 32 entries. The BHT is a two-dimensional table:

The first dimension represents the access address, with a length equal to 32 / 4.
The second dimension represents the row index, with a length equal to 4.

In the case of branch prediction, the BHT uses only part of the virtual address to get the value of the saturation counter. In the case of a valid misprediction, the BHT uses only part of the misprediction address to access the BHT table and update the saturation counter.

UPPER_ADDRESS_INDEX = $clog2(BHTDepth) + ((RVC == 1) ? 1 : 2)

LOWER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 + $clog2(INSTR_PER_FETCH)

ACCESS_ADDRESS = PC/MISPREDICT_ADDRESS [ UPPER_ADDRESS_INDEX : LOWER_ADDRESS_INDEX ]

The lower address bits of the virtual address point to the memory entry.

UPPER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 + $clog2(INSTR_PER_FETCH)

LOWER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 + $clog2(INSTR_PER_FETCH)

ACCESS_INDEX = PC/MISPREDICT_ADDRESS [ UPPER_ADDRESS_INDEX : LOWER_ADDRESS_INDEX]

Two distinct branches with different addresses can share the same BHT entry if they have the same ACCESS_ADDRESS.

Each BHT entry contains a two-bit saturating counter and a valid bit. On reset, the counters are set to 0 (strongly not taken) and the valid bits are cleared. When a branch instruction is resolved by the EX_STAGE module, the valid bit is set and the counter is updated. The two bit counter is updated by the successive execution of the instructions as shown in the following figure.

BHT saturation

When a branch instruction is pre-decoded by instr_scan submodule, the BHT valids whether the PC address is inside the BHT and provides the taken or not prediction. The prediction is the most significant bit from the counter, where 1 means "taken".

When the Execute stage processes a branch instruction, it sends the branch status (whether it’s taken or not) to the Frontend to update the BHT table

The BHT is never flushed.

Table 7. **bht module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`vpc_i`	in	Virtual PC	CACHE	logic[CVA6Cfg.VLEN-1:0]
`bht_update_i`	in	Update bht with resolved address	EXECUTE	bht_update_t
`bht_prediction_o`	out	Prediction from bht	FRONTEND	ariane_pkg::bht_prediction_t[CVA6Cfg.INSTR_PER_FETCH-1:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

For any HW configuration,

flush_bp_i input is tied to 0

As DebugEn = False,

debug_mode_i input is tied to 0

4.1.5.5. BTB (Branch Target Buffer) submodule

There is no BTB in CV32A65X. As a consequence, no valid address is returned from BTB.

4.1.5.6. RAS (Return Address Stack) submodule

RAS is implemented as a LIFO which is composed of 2 entries.

When a "call" JAL instruction (rd = x1 or x5) is added to the instruction queue, the PC of the instruction following the JAL instruction is pushed into the stack.

When a "ret" JALR instruction (rs1 = x1 or x5, and rd != rs1) is added to the instruction queue, the predicted return address is popped from the stack. If the predicted return address is wrong due for instance to speculation or RAS depth limitation, a mispredict will be generated.

The RAS is never flushed.

Table 8. **ras module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`push_i`	in	Push address in RAS	FRONTEND	logic
`pop_i`	in	Pop address from RAS	FRONTEND	logic
`data_i`	in	Data to be pushed	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`data_o`	out	Popped data	FRONTEND	ras_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

For any HW configuration,

flush_bp_i input is tied to 0

4.2. ID_STAGE Module

4.2.1. Description

The ID_STAGE module implements the decode stage of the pipeline. Its main purpose is to decode RISC-V instructions coming from FRONTEND module (fetch stage) and send them to the ISSUE_STAGE module (issue stage).

The compressed_decoder module checks whether the incoming instruction is compressed and output the corresponding uncompressed instruction. Then the decoder module decodes the instruction and send it to the issue stage.

The module is connected to:

CONTROLLER module can request to flush the buffer at the end of ID_STAGE
FRONTEND module sends instructions to ID_STAGE module
ISSUE module receives the decoded instructions from ID_STAGE module
CSR_REGFILE module sends status information about privilege mode, traps, extension support.

Table 9. **id_stage module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Fetch flush request	CONTROLLER	logic
`fetch_entry_i`	in	Handshake’s data between fetch and decode	FRONTEND	fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_valid_i`	in	Handshake’s valid between fetch and decode	FRONTEND	logic[CVA6Cfg.NrIssuePorts-1:0]
`fetch_entry_ready_o`	out	Handshake’s ready between fetch and decode	FRONTEND	logic[CVA6Cfg.NrIssuePorts-1:0]
`issue_entry_o`	out	Handshake’s data between decode and issue	ISSUE	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`issue_entry_o_prev`	out	none	none	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`orig_instr_o`	out	Instruction value	ISSUE	logic[CVA6Cfg.NrIssuePorts-1:0][31:0]
`issue_entry_valid_o`	out	Handshake’s valid between decode and issue	ISSUE	logic[CVA6Cfg.NrIssuePorts-1:0]
`is_ctrl_flow_o`	out	Report if instruction is a control flow instruction	ISSUE	logic[CVA6Cfg.NrIssuePorts-1:0]
`issue_instr_ack_i`	in	Handshake’s acknowlege between decode and issue	ISSUE	logic[CVA6Cfg.NrIssuePorts-1:0]
`irq_i`	in	Level sensitive (async) interrupts	SUBSYSTEM	logic[1:0]
`irq_ctrl_i`	in	Interrupt control status	CSR_REGFILE	irq_ctrl_t
`hart_id_i`	in	none	none	logic[CVA6Cfg.XLEN-1:0]
`compressed_ready_i`	in	none	none	logic
`jvt_i`	in	none	none	jvt_t
`compressed_resp_i`	in	none	none	x_compressed_resp_t
`compressed_valid_o`	out	none	none	logic
`compressed_req_o`	out	none	none	x_compressed_req_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As DebugEn = False,

debug_req_i input is tied to 0
debug_mode_i input is tied to 0

As IsRVFI = 0,

was_compressed_o output is tied to 0

As PRIV = MachineOnly,

priv_lvl_i input is tied to MachineMode
tvm_i input is tied to 0
tw_i input is tied to 0
tsr_i input is tied to 0

As RVH = False,

v_i input is tied to 0
vfs_i input is tied to 0
vtw_i input is tied to 0
hu_i input is tied to 0

As RVF = 0,

fs_i input is tied to 0
frm_i input is tied to 0

As RVV = False,

vs_i input is tied to 0

As RVZCMT = False,

obi_zcmt_req_o output is tied to 0
obi_zcmt_rsp_i input is tied to 0

4.2.2. Functionality

ID_STAGE transforms each instruction into a scoreboard entry whose fields indicate what the instruction does. It receives external interrupts and, according to the interrupts configuration from CSR_REGFILE, it inserts exceptions in the pipeline. It outputs scoreboard entries via a FIFO of depth 2, which can push and pull up to 2 instructions each cycle.

4.2.3. Submodules

ID_STAGE submodules

4.2.3.1. Compressed_decoder

The compressed_decoder module decompresses all the compressed instructions taking a 16-bit compressed instruction and expanding it to its 32-bit equivalent. All compressed instructions have a 32-bit equivalent.

Non-compressed instructions on the input are transmitted as-is.

Table 10. **compressed_decoder module** IO ports
Signal	IO	Description	connexion	Type
`instr_i`	in	Input instruction coming from fetch stage	FRONTEND	logic[31:0]
`instr_o`	out	Output instruction in uncompressed format	decoder	logic[31:0]
`illegal_instr_o`	out	Input instruction is illegal	decoder	logic
`is_macro_instr_o`	out	Output instruction is macro	decoder	logic
`is_compressed_o`	out	Output instruction is compressed	decoder	logic
`is_zcmt_instr_o`	out	Output instruction is macro	decoder	logic

4.2.3.2. Decoder

The decoder module takes the output of compressed_decoder module and decodes it. It transforms the instruction to the most fundamental control structure in pipeline, a scoreboard entry.

The scoreboard entry contains an exception entry which is composed of a valid field, a cause and a value called TVAL. As TVALEn configuration parameter is zero, the TVAL field is not implemented.

A potential illegal instruction exception can be detected during decoding. If no exception has happened previously in fetch stage, the decoder will valid the exception and add the cause and tval value to the scoreboard entry.

A potential interrupt can be sent to the decoder. If no exception has happened previously in fetch stage, the exception is inserted to into the scoreboard entry.

Table 11. **decoder module** IO ports
Signal	IO	Description	connexion	Type
`pc_i`	in	PC from fetch stage	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`is_compressed_i`	in	none	none	logic
`compressed_instr_i`	in	Compressed form of instruction	FRONTEND	logic[15:0]
`is_illegal_i`	in	Illegal compressed instruction	compressed_decoder	logic
`instruction_i`	in	Instruction from fetch stage	FRONTEND	logic[31:0]
`is_macro_instr_i`	in	Is a macro instruction	macro_decoder	logic
`is_last_macro_instr_i`	in	Is a last macro instruction	macro_decoder	logic
`is_double_rd_macro_instr_i`	in	Is mvsa01/mva01s macro instruction	macro_decoder	logic
`is_zcmt_i`	in	Zcmt instruction	FRONTEND	logic
`jump_address_i`	in	Jump address	zcmt_decoder	logic[CVA6Cfg.XLEN-1:0]
`branch_predict_i`	in	Is a branch predict instruction	FRONTEND	branchpredict_sbe_t
`ex_i`	in	If an exception occured in fetch stage	FRONTEND	exception_t
`irq_i`	in	Level sensitive (async) interrupts	SUBSYSTEM	logic[1:0]
`irq_ctrl_i`	in	Interrupt control status	CSR_REGFILE	irq_ctrl_t
`instruction_o`	out	Instruction to be added to scoreboard entry	ISSUE_STAGE	scoreboard_entry_t
`orig_instr_o`	out	Instruction	ISSUE_STAGE	logic[31:0]
`is_control_flow_instr_o`	out	Is a control flow instruction	ISSUE_STAGE	logic

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As DebugEn = False,

debug_req_i input is tied to 0
debug_mode_i input is tied to 0

As PRIV = MachineOnly,

priv_lvl_i input is tied to MachineMode
tvm_i input is tied to 0
tw_i input is tied to 0
tsr_i input is tied to 0

As RVH = False,

v_i input is tied to 0
vfs_i input is tied to 0
vtw_i input is tied to 0
hu_i input is tied to 0

As RVF = 0,

fs_i input is tied to 0
frm_i input is tied to 0

As RVV = False,

vs_i input is tied to 0

4.3. ISSUE_STAGE Module

4.3.1. Description

ISSUE_STAGE issues instructions (1), reorders their results (2) and sends completed instructions in-order to COMMIT_STAGE (3).

(1) ISSUE_STAGE issues instructions in-order. It makes sure that instructions from ID_STAGE have everything they need to run. It waits until all requirements are met. Once an instruction is ready to run, ISSUE_STAGE sends it to EX_STAGE with its operands.

(2) ISSUE_STAGE reorders instructions results. It gets results of instruction executions out-of-order from EX_STAGE. ISSUE_STAGE stores these results reordered.

(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. This is where architectural state is modified.

Scoreboard module keeps track of instructions and their results. Issue_read_operands module contains all the issue logic and the register file.

The module is connected to:

CONTROLLER module can request to flush the pipeline buffer at the end of ISSUE_STAGE. CONTROLLER module can also request to flush the whole Scoreboard.
ID_STAGE module delivers decoded instructions to ISSUE_STAGE.
EX_STAGE module gets instructions issued by ISSUE_STAGE to execute them. EX_STAGE module also returns results to ISSUE_STAGE.
COMMIT_STAGE module delivers ISSUE_STAGE clearance to remove the oldest instruction from Scoreboard.

Table 12. **issue_stage module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_unissued_instr_i`	in	Prevent from issuing	CONTROLLER	logic
`flush_i`	in	Flush whole scoreboard	CONTROLLER	logic
`decoded_instr_i`	in	Handshake’s data with decode stage	ID_STAGE	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`decoded_instr_i_prev`	in	none	none	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`orig_instr_i`	in	instruction value	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0][31:0]
`decoded_instr_valid_i`	in	Handshake’s valid with decode stage	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`is_ctrl_flow_i`	in	Is instruction a control flow instruction	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`decoded_instr_ack_o`	out	Handshake’s acknowlege with decode stage	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`rs1_forwarding_o`	out	rs1 forwarding	EX_STAGE	[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0]
`rs2_forwarding_o`	out	rs2 forwarding	EX_STAGE	[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0]
`fu_data_o`	out	FU data useful to execute instruction	EX_STAGE	fu_data_t[CVA6Cfg.NrIssuePorts-1:0]
`pc_o`	out	Program Counter	EX_STAGE	logic[CVA6Cfg.VLEN-1:0]
`is_zcmt_o`	out	Is zcmt instruction	EX_STAGE	logic
`is_compressed_instr_o`	out	Is compressed instruction	EX_STAGE	logic
`flu_ready_i`	in	Fixed Latency Unit is ready	EX_STAGE	logic
`alu_valid_o`	out	ALU output is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`branch_valid_o`	out	Branch unit is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`branch_predict_o`	out	Information of branch prediction	EX_STAGE	branchpredict_sbe_t
`resolve_branch_i`	in	Signaling that we resolved the branch	EX_STAGE	logic
`lsu_ready_i`	in	Load store unit FU is ready	EX_STAGE	logic
`lsu_valid_o`	out	Load store unit FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`mult_valid_o`	out	Mult FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`alu2_valid_o`	out	ALU2 FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`csr_valid_o`	out	CSR is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`xfu_valid_o`	out	CVXIF FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`xfu_ready_i`	in	CVXIF is FU ready	EX_STAGE	logic
`x_off_instr_o`	out	CVXIF offloader instruction value	EX_STAGE	logic[31:0]
`hart_id_i`	in	CVA6 Hart ID	SUBSYSTEM	logic[CVA6Cfg.XLEN-1:0]
`x_issue_ready_i`	in	CVXIF Issue interface	EX_STAGE	logic
`x_issue_resp_i`	in	CVXIF Issue interface response	EX_STAGE	x_issue_resp_t
`x_issue_valid_o`	out	CVXIF Issue interface valid	EX_STAGE	logic
`x_issue_req_o`	out	CVXIF Issue interface request	EX_STAGE	x_issue_req_t
`x_register_ready_i`	in	CVXIF Register interface	EX_STAGE	logic
`x_register_valid_o`	out	CVXIF register interface valid	EX_STAGE	logic
`x_register_o`	out	CVXIF register interface	EX_STAGE	x_register_t
`x_commit_valid_o`	out	CVXIF Commit interface	EX_STAGE	logic
`x_commit_o`	out	CVXIF Commit interface	EX_STAGE	x_commit_t
`x_transaction_rejected_o`	out	CVXIF Transaction rejected → instruction is illegal	EX_STAGE	logic
`trans_id_i`	in	Transaction ID	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.TRANS_ID_BITS-1:0]
`resolved_branch_i`	in	Result from branch unit	EX_STAGE	bp_resolve_t
`wbdata_i`	in	Results to write back	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.XLEN-1:0]
`ex_ex_i`	in	exception from execute stage or CVXIF	EX_STAGE	exception_t[CVA6Cfg.NrWbPorts-1:0]
`wt_valid_i`	in	Indicates valid results	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0]
`x_we_i`	in	CVXIF write enable	EX_STAGE	logic
`x_rd_i`	in	CVXIF destination register	EX_STAGE	logic[4:0]
`waddr_i`	in	Destination register in register file	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][4:0]
`wdata_i`	in	Value to write to register file	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0]
`we_gpr_i`	in	GPR write enable	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`commit_instr_o`	out	Instructions to commit	COMMIT_STAGE	scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0]
`commit_drop_o`	out	Instruction is cancelled	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`commit_ack_i`	in	Commit acknowledge	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As PerfCounterEn = 0,

sb_full_o output is tied to 0
stall_issue_o output is tied to 0

As EnableAccelerator = 0,

stall_i input is tied to 0
issue_instr_o output is tied to 0
issue_instr_hs_o output is tied to 0

As RVH = False,

tinst_o output is tied to 0

As RVF = 0,

fpu_ready_i input is tied to 0
fpu_valid_o output is tied to 0
fpu_fmt_o output is tied to 0
fpu_rm_o output is tied to 0
we_fpr_i input is tied to 0

As IsRVFI = 0,

rvfi_issue_pointer_o output is tied to 0
rvfi_commit_pointer_o output is tied to 0
rvfi_rs1_o output is tied to 0
rvfi_rs2_o output is tied to 0

4.3.2. Functionality

ISSUE_STAGE has three functionalities.

(1) ISSUE_STAGE issues instructions. Instructions from ID_STAGE are sent to Scoreboard module, which forwards them to Issue_read_operands module. Issue_read_operands queries Scoreboard module for data dependences (Scoreboard is also able to return forwarded values) and gets the list of busy functional units from EX_STAGE. Issue_read_operands sends to EX_STAGE the instructions to execute and acknowledges to Scoreboard so that it stores them: the instruction is issued. Issued instructions are acknowledged to ID_STAGE. Each of these steps can block its successors. The flush signal from CONTROLLER module is also sent to Scoreboard module to prevent from issuing. Instructions are issued in-order: an instruction cannot be issued unless all its predecessors are issued.

(2) ISSUE_STAGE reorders instructions results. Results from EX_STAGE are sent to Scoreboard module so that they are stored.

(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. The oldest instructions from Scoreboard are exposed to COMMIT_STAGE. When COMMIT_STAGE acknowledges a commit, the committed instruction is removed from Scoreboard and the register file in Issue_read_operands is updated with the instruction result.

4.3.3. Submodules

ISSUE_STAGE submodules

4.3.3.1. Scoreboard

Scoreboard contains a FIFO which contains an entry for each issued instruction. Each entry is removed once the instruction is committed. Instruction results are inserted into Scoreboard when they are ready. The FIFO is flushed when requested by CONTROLLER.

Scoreboard is used in all three functionalities of ISSUE_STAGE.

(1) ISSUE_STAGE issues instructions. Up to 2 instruction(s) can be received from ID_STAGE each cycle. They are transmitted to Issue_read_operands with incremental transaction IDs which wrap at 8. The result buses and Scoreboard entries are also transmitted to Issue_read_operands for it to detect data dependences and perform operand forwarding. When Issue_read_operands acknowledges an instruction, it is inserted into the FIFO.

Scoreboard has a capacity of 8 entries. Instructions which would make Scoreboard overflow are not transmitted to Issue_read_operands (ISSUE_STAGE stalls).

The flush signal from CONTROLLER module removes all entries from the Scoreboard and prevents from issuing. The transaction ID of the next issued instruction is 0.

(2) ISSUE_STAGE reorders instructions results. Results are returned from functional units in the EX_STAGE to Scoreboard via result buses, with their transaction IDs. Scoreboard stores this result into the entry associated with this transaction ID. If an exception is returned, it is stored too. If the result is a branch miss, the speculative instructions following it are cancelled.

FIXME Document behavior related to CV-X-IF

(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. Each of the 2 oldest entry(ies) in Scoreboard are exposed to COMMIT_STAGE, one per commit port. This makes commit happen in-order. When COMMIT_STAGE acknowledges on a commit port, the entry is removed from Scoreboard.

Table 13. **scoreboard module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_unissued_instr_i`	in	Prevent from issuing	CONTROLLER	logic
`flush_i`	in	Flush whole scoreboard	CONTROLLER	logic
`x_transaction_accepted_i`	in	CVXIF trasaction accepted	ISSUE_READ_OPERANDS	logic
`x_issue_writeback_i`	in	CVXIF Issue writeback	ISSUE_READ_OPERANDS	logic
`x_id_i`	in	CVXIF ID	ISSUE_READ_OPERANDS	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`commit_instr_o`	out	Instructions to commit	COMMIT_STAGE	scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0]
`commit_drop_o`	out	Instruction is cancelled	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`commit_ack_i`	in	Commit acknowledge	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`decoded_instr_i`	in	Handshake’s data with decode stage	ID_STAGE	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`orig_instr_i`	in	instruction value	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0][31:0]
`decoded_instr_valid_i`	in	Handshake’s valid with decode stage	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`decoded_instr_ack_o`	out	Handshake’s acknowlege with decode stage	ID_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`issue_instr_o`	out	Entry about the instruction to issue	ISSUE_READ_OPERANDS	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`orig_instr_o`	out	Instruction to issue	ISSUE_READ_OPERANDS	logic[CVA6Cfg.NrIssuePorts-1:0][31:0]
`issue_instr_valid_o`	out	Is there an instruction to issue	ISSUE_READ_OPERANDS	logic[CVA6Cfg.NrIssuePorts-1:0]
`issue_ack_i`	in	Issue stage acknowledge	ISSUE_READ_OPERANDS	logic[CVA6Cfg.NrIssuePorts-1:0]
`fwd_o`	out	Forwarding	ISSUE_READ_OPERANDS	forwarding_t
`resolved_branch_i`	in	Result from branch unit	EX_STAGE	bp_resolve_t
`trans_id_i`	in	Transaction ID at which to write the result back	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.TRANS_ID_BITS-1:0]
`wbdata_i`	in	Results to write back	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.XLEN-1:0]
`ex_i`	in	Exception from a functional unit (e.g.: ld/st exception)	EX_STAGE	exception_t[CVA6Cfg.NrWbPorts-1:0]
`wt_valid_i`	in	Indicates valid results	EX_STAGE	logic[CVA6Cfg.NrWbPorts-1:0]
`x_we_i`	in	Cvxif we for writeback	EX_STAGE	logic
`x_rd_i`	in	CVXIF destination register	ISSUE_STAGE	logic[4:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As PerfCounterEn = 0,

sb_full_o output is tied to 0

As IsRVFI = 0,

rvfi_issue_pointer_o output is tied to 0
rvfi_commit_pointer_o output is tied to 0

4.3.3.2. Issue_read_operands

Issue_read_operands tracks hazards and gets the input operands for the instructions to execute. The following hazards can prevent instructions from being issued.

Data hazards: ISSUE_STAGE checks that the instruction operands are available.
- Read-After-Write (RAW): if one of the source registers of the instruction to issue is the destination register of one of the instructions in the Scoreboard, issue is blocked. However, CVA6 implements operand forwarding: instead of blocking the instruction, the operand is taken from either
  a) a functional unit which returns a result which is not an exception, with a transaction ID which points to a Scoreboard entry whose destination register is the requested source register;
  b) a Scoreboard entry whose destination register is the requested source register has a result which is not an exception.
  Forwarding is not possible from CSR instructions.
  
  The instruction in the second issue port is not issued when one of its source registers is the destination register of the instruction in the first issue port. No operand forwarding is possible between the two issue ports.
- Write-After-Write (WAW): if the instruction to issue has the same destination register as one of the instructions in the Scoreboard, issue is blocked. Instructions being committed are ignored because the will not be in the scoreboard anymore since the next cycle.
  
  The instruction in the second issue port is not issued when it has the same destination register as the instruction in the first issue port.
- Special case: there are no data hazards on x0.
- Cancelled instructions are ignored by the hazard detection mechanism.
- FIXME hazards related to CV-X-IF
Structural hazards: ISSUE_STAGE checks that a functional unit (FU) and its result bus (RB) are ready to execute the instruction.
- Integer division instructions and some^{[FIXME which?]} CSR instructions have an unknown latency. When EX_STAGE reports that such an instruction is running, instructions using ALU, BRANCH, CSR or MULT are blocked. This is to avoid conflicts on the RB shared by these four FUs. Note that ALU2 might still be available.
- Multiplications have a fixed latency of 2 cycles. Instructions using ALU, BRANCH or CSR are blocked if an instruction using MULT was issued one cycle earlier. This is to avoid conflicts on the RB shared by these four FUs. Note that ALU2 might still be available. Instructions using MULT are not blocked because the multiplier is pipelined and can accept one instruction each cycle.
- Instructions using LSU are blocked if LSU is not ready.
- The second issue port cannot issue instructions using CSR.
- The second issue port cannot issue instructions using CV-X-IF.
- The second issue port cannot issue an instruction using a FU when the first port issues an instruction using the same FU. This is because a FU can only accept one instruction each cycle.
- When both ALU and ALU2 are available, ALU instructions use ALU2.
- The second issue port cannot issue an instruction using ALU, BRANCH or CSR when the first issue port is issuing an instruction using ALU, BRANCH or CSR. This is to avoid conflicts on the RB shared by these four FUs. Note that ALU2 might still be available.
Control hazards: ISSUE_STAGE checks that no instruction is executed speculatively while it must not.
- The second issue port is completely blocked when the first port issues an instruction using CSR.
- The second issue port cannot issue an instruction using LSU when the first issue port issues an instruction using BRANCH, unless it is a JAL.

Data hazards are ignored when an exception occurred earlier in the pipeline. As no FU is involved, there are no structural hazards either.

Instructions are issued in-order, which means that when an instruction makes ISSUE_STAGE stall, next instructions are blocked. As a consequence, it is not possible to issue the instruction in the second issue port when the instruction in the first issue port is blocked.

However, it is possible that the instruction in the first issue port is issued while the instruction in the second issue port is not. In such case, ID_STAGE moves the instruction in the second issue port to the first issue port so that it is issued next.

The input operands provided to EX_STAGE come from the register file by default. However, when one of the source registers has a RAW dependence, the corresponding input operand is replaced by the forwarded value (see Data hazards/RAW hazards above). The register file is an instance of ariane_regfile where ach register stores 32 bits and the register at index 0 is wired to zero.

FIXME Document behavior related to CV-X-IF

Instructions are sent to EX_STAGE via a register so they are visible in EX_STAGE one cycle after being issued.

Table 14. **issue_read_operands module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Prevent from issuing	CONTROLLER	logic
`issue_instr_i`	in	Entry about the instruction to issue	SCOREBOARD	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`issue_instr_i_prev`	in	none	none	scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0]
`orig_instr_i`	in	Instruction to issue	SCOREBOARD	logic[CVA6Cfg.NrIssuePorts-1:0][31:0]
`issue_instr_valid_i`	in	Is there an instruction to issue	SCOREBOARD	logic[CVA6Cfg.NrIssuePorts-1:0]
`issue_ack_o`	out	Issue stage acknowledge	SCOREBOARD	logic[CVA6Cfg.NrIssuePorts-1:0]
`fwd_i`	in	Forwarding	SCOREBOARD	forwarding_t
`fu_data_o`	out	FU data useful to execute instruction	EX_STAGE	fu_data_t[CVA6Cfg.NrIssuePorts-1:0]
`rs1_forwarding_o`	out	Unregistered version of fu_data_o.operanda	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0]
`rs2_forwarding_o`	out	Unregistered version of fu_data_o.operandb	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0]
`pc_o`	out	Program Counter	EX_STAGE	logic[CVA6Cfg.VLEN-1:0]
`is_zcmt_o`	out	Is zcmt	EX_STAGE	logic
`is_compressed_instr_o`	out	Is compressed instruction	EX_STAGE	logic
`flu_ready_i`	in	Fixed Latency Unit is ready	EX_STAGE	logic
`alu_valid_o`	out	ALU output is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`branch_valid_o`	out	Branch unit is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`branch_predict_o`	out	Information of branch prediction	EX_STAGE	branchpredict_sbe_t
`lsu_ready_i`	in	Load store unit FU is ready	EX_STAGE	logic
`lsu_valid_o`	out	Load store unit FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`mult_valid_o`	out	Mult FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`alu2_valid_o`	out	ALU2 FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`csr_valid_o`	out	CSR is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`cvxif_valid_o`	out	CVXIF FU is valid	EX_STAGE	logic[CVA6Cfg.NrIssuePorts-1:0]
`cvxif_ready_i`	in	CVXIF is FU ready	EX_STAGE	logic
`cvxif_off_instr_o`	out	CVXIF offloader instruction value	EX_STAGE	logic[31:0]
`hart_id_i`	in	CVA6 Hart ID	SUBSYSTEM	logic[CVA6Cfg.XLEN-1:0]
`x_issue_ready_i`	in	none	none	logic
`x_issue_resp_i`	in	none	none	x_issue_resp_t
`x_issue_valid_o`	out	none	none	logic
`x_issue_req_o`	out	none	none	x_issue_req_t
`x_register_ready_i`	in	none	none	logic
`x_register_valid_o`	out	none	none	logic
`x_register_o`	out	none	none	x_register_t
`x_commit_valid_o`	out	none	none	logic
`x_commit_o`	out	none	none	x_commit_t
`x_transaction_accepted_o`	out	none	none	logic
`x_transaction_rejected_o`	out	none	none	logic
`x_issue_writeback_o`	out	none	none	logic
`x_id_o`	out	none	none	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`waddr_i`	in	Destination register in the register file	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][4:0]
`wdata_i`	in	Value to write to register file	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0]
`we_gpr_i`	in	GPR write enable	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As EnableAccelerator = 0,

stall_i input is tied to 0

As RVH = False,

tinst_o output is tied to 0

As RVF = 0,

fpu_ready_i input is tied to 0
fpu_valid_o output is tied to 0
fpu_fmt_o output is tied to 0
fpu_rm_o output is tied to 0
we_fpr_i input is tied to 0

As PerfCounterEn = 0,

stall_issue_o output is tied to 0

As IsRVFI = 0,

rvfi_rs1_o output is tied to 0
rvfi_rs2_o output is tied to 0

4.4. EX_STAGE Module

4.4.1. Description

The EX_STAGE module is a logical stage which implements the execute stage. It encapsulates the following functional units: ALU, Branch Unit, CSR buffer, Mult, load and store and CVXIF.

The module is connected to:

ID_STAGE module provides scoreboard entry. *

4.4.2. Functionality

4.4.3. Submodules

EX_STAGE submodules

4.4.3.1. alu

The arithmetic logic unit (ALU) is a small piece of hardware which performs 32 and 64-bit arithmetic and bitwise operations: subtraction, addition, shifts, comparisons… It always completes its operation in a single cycle.

Table 15. **alu module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`fu_data_i`	in	FU data needed to execute instruction	ISSUE_STAGE	fu_data_t
`result_o`	out	ALU result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`alu_branch_res_o`	out	ALU branch compare result	branch_unit	logic

4.4.3.2. branch_unit

The branch unit module manages all kinds of control flow changes i.e.: conditional and unconditional jumps. It calculates the target address and decides whether to take the branch or not. It also decides if a branch was mis-predicted or not and reports corrective actions to the pipeline stages.

Table 16. **branch_unit module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`fu_data_i`	in	FU data needed to execute instruction	ISSUE_STAGE	fu_data_t
`pc_i`	in	Instruction PC	ISSUE_STAGE	logic[CVA6Cfg.VLEN-1:0]
`is_zcmt_i`	in	Is zcmt instruction	ISSUE_STAGE	logic
`is_compressed_instr_i`	in	Instruction is compressed	ISSUE_STAGE	logic
`branch_valid_i`	in	Branch unit instruction is valid	ISSUE_STAGE	logic
`branch_comp_res_i`	in	ALU branch compare result	ALU	logic
`branch_result_o`	out	Brach unit result	ISSUE_STAGE	logic[CVA6Cfg.VLEN-1:0]
`branch_predict_i`	in	Information of branch prediction	ISSUE_STAGE	branchpredict_sbe_t
`resolved_branch_o`	out	Signaling that we resolved the branch	ISSUE_STAGE	bp_resolve_t
`resolve_branch_o`	out	Branch is resolved, new entries can be accepted by scoreboard	ID_STAGE	logic

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVH = False,

v_i input is tied to 0

As DebugEn = False,

debug_mode_i input is tied to 0

As PerfCounterEn = 0,

branch_exception_o output is tied to 0

4.4.3.3. CSR_buffer

The CSR buffer module stores the CSR address at which the instruction is going to read/write. As the CSR instruction alters the processor architectural state, this instruction has to be buffered until the commit stage decides to execute the instruction.

Table 17. **csr_buffer module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush CSR	CONTROLLER	logic
`fu_data_i`	in	FU data needed to execute instruction	ISSUE_STAGE	fu_data_t
`csr_ready_o`	out	CSR FU is ready	ISSUE_STAGE	logic
`csr_valid_i`	in	CSR instruction is valid	ISSUE_STAGE	logic
`csr_result_o`	out	CSR buffer result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`csr_commit_i`	in	commit the pending CSR OP	COMMIT_STAGE	logic
`csr_addr_o`	out	CSR address to write	COMMIT_STAGE	logic[11:0]

4.4.3.4. mult

The multiplier module supports the division and multiplication operations.

mult submodules

Table 18. **mult module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush	CONTROLLER	logic
`fu_data_i`	in	FU data needed to execute instruction	ISSUE_STAGE	fu_data_t
`mult_valid_i`	in	Mult instruction is valid	ISSUE_STAGE	logic
`result_o`	out	Mult result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`mult_valid_o`	out	Mult result is valid	ISSUE_STAGE	logic
`mult_ready_o`	out	Mutl is ready	ISSUE_STAGE	logic
`mult_trans_id_o`	out	Mult transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]

4.4.3.4.1. multiplier

Multiplication is performed in two cycles and is fully pipelined.

Table 19. **multiplier module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`trans_id_i`	in	Multiplier transaction ID	Mult	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`mult_valid_i`	in	Multiplier instruction is valid	Mult	logic
`operation_i`	in	Multiplier operation	Mult	fu_op
`operand_a_i`	in	A operand	Mult	logic[CVA6Cfg.XLEN-1:0]
`operand_b_i`	in	B operand	Mult	logic[CVA6Cfg.XLEN-1:0]
`result_o`	out	Multiplier result	Mult	logic[CVA6Cfg.XLEN-1:0]
`mult_valid_o`	out	Mutliplier result is valid	Mult	logic
`mult_trans_id_o`	out	Multiplier transaction ID	Mult	logic[CVA6Cfg.TRANS_ID_BITS-1:0]

4.4.3.4.2. serdiv

The division is a simple serial divider which needs 64 cycles in the worst case.

Table 20. **serdiv module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`id_i`	in	Serdiv translation ID	Mult	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`op_a_i`	in	A operand	Mult	logic[WIDTH-1:0]
`op_b_i`	in	B operand	Mult	logic[WIDTH-1:0]
`rem`	in	Serdiv operation	Mult	logic[1:0]opcode_i,//0:udiv,2:urem,1:div,3:
`in_vld_i`	in	Serdiv instruction is valid	Mult	logic
`in_rdy_o`	out	Serdiv FU is ready	Mult	logic
`flush_i`	in	Flush	CONTROLLER	logic
`out_vld_o`	out	Serdiv result is valid	Mult	logic
`out_rdy_i`	in	Serdiv is ready	Mult	logic
`id_o`	out	Serdiv transaction ID	Mult	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`res_o`	out	Serdiv result	Mult	logic[WIDTH-1:0]

4.4.3.5. load_store_unit (LSU)

The load store module interfaces with the data cache (D$) to manage the load and store operations.

The LSU does not handle misaligned accesses. Misaligned accesses are double word accesses which are not aligned to a 64-bit boundary, word accesses which are not aligned to a 32-bit boundary and half word accesses which are not aligned on 16-bit boundary. If the LSU encounters a misaligned load or store, it throws a misaligned exception.

load_store_unit submodules

Table 21. **load_store_unit module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush	CONTROLLER	logic
`no_st_pending_o`	out	No store pending	COMMIT_STAGE	logic
`fu_data_i`	in	FU data needed to execute instruction	ISSUE_STAGE	fu_data_t
`lsu_ready_o`	out	Load Store Unit is ready	ISSUE_STAGE	logic
`lsu_valid_i`	in	Load Store Unit instruction is valid	ISSUE_STAGE	logic
`load_trans_id_o`	out	Load transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`load_result_o`	out	Load result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`load_valid_o`	out	Load result is valid	ISSUE_STAGE	logic
`load_exception_o`	out	Load exception	ISSUE_STAGE	exception_t
`store_trans_id_o`	out	Store transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`store_result_o`	out	Store result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`store_valid_o`	out	Store result is valid	ISSUE_STAGE	logic
`store_exception_o`	out	Store exception	ISSUE_STAGE	exception_t
`commit_i`	in	Commit the first pending store	COMMIT_STAGE	logic
`commit_ready_o`	out	Commit queue is ready to accept another commit request	COMMIT_STAGE	logic
`commit_tran_id_i`	in	Commit transaction ID	COMMIT_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`fetch_areq_i`	in	Instruction cache input request	FETCH	fetch_areq_t
`fetch_arsp_o`	out	Instruction cache output response	FETCH	fetch_arsp_t
`obi_store_req_o`	out	Store cache response	DCACHE	obi_store_req_t
`obi_store_rsp_i`	in	Store cache request	DCACHE	obi_store_rsp_t
`obi_load_req_o`	out	Load cache response	DCACHE	obi_load_req_t
`obi_load_rsp_i`	in	Load cache request	DCACHE	obi_load_rsp_t
`pmpcfg_i`	in	PMP configuration	CSR_REGFILE	riscv::pmpcfg_t[(CVA6Cfg.NrPMPEntries>0?CVA6Cfg.NrPMPEntries-1:0):0]
`pmpaddr_i`	in	PMP address	CSR_REGFILE	logic[(CVA6Cfg.NrPMPEntries>0?CVA6Cfg.NrPMPEntries-1:0):0][CVA6Cfg.PLEN-3:0]

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As EnableAccelerator = 0,

stall_st_pending_i input is tied to 0

As RVA = False,

amo_valid_commit_i input is tied to 0
obi_amo_req_o output is tied to 0
obi_amo_rsp_i input is tied to 0

As RVH = False,

tinst_i input is tied to 0
enable_g_translation_i input is tied to 0
en_ld_st_g_translation_i input is tied to 0
v_i input is tied to 0
ld_st_v_i input is tied to 0
csr_hs_ld_st_inst_o output is tied to 0
vs_sum_i input is tied to 0
vmxr_i input is tied to 0
vsatp_ppn_i input is tied to 0
vs_asid_i input is tied to 0
hgatp_ppn_i input is tied to 0
vmid_i input is tied to 0
vmid_to_be_flushed_i input is tied to 0
gpaddr_to_be_flushed_i input is tied to 0
flush_tlb_vvma_i input is tied to 0
flush_tlb_gvma_i input is tied to 0

As RVS = False,

enable_translation_i input is tied to 0
en_ld_st_translation_i input is tied to 0
sum_i input is tied to 0
mxr_i input is tied to 0
satp_ppn_i input is tied to 0
asid_i input is tied to 0
asid_to_be_flushed_i input is tied to 0
vaddr_to_be_flushed_i input is tied to 0

As PRIV = MachineOnly,

priv_lvl_i input is tied to MachineMode
ld_st_priv_lvl_i input is tied to MAchineMode

As MMUPresent = 0,

flush_tlb_i input is tied to 0
obi_mmu_ptw_req_o output is tied to 0
obi_mmu_ptw_rsp_i input is tied to 0

As PerfCounterEn = 0,

itlb_miss_o output is tied to 0
dtlb_miss_o output is tied to 0

As PipelineOnly = True,

load_req_o output is tied to 0
load_rsp_i input is tied to 0
dcache_wbuffer_empty_i input is tied to 0
dcache_wbuffer_not_ni_i input is tied to 0

As IsRVFI = 0,

rvfi_lsu_ctrl_o output is tied to 0
rvfi_mem_paddr_o output is tied to 0

4.4.3.5.1. store_unit

The store_unit module manages the data store operations.

As stores can be speculative, the store instructions need to be committed by ISSUE_STAGE module before possibily altering the processor state. Store buffer keeps track of store requests. Outstanding store instructions (which are speculative) are differentiated from committed stores. When ISSUE_STAGE module commits a store instruction, outstanding stores become committed.

When commit buffer is not empty, the buffer automatically tries to write the oldest store to the data cache.

Furthermore, the store_unit module provides information to the load_unit to know if an outstanding store matches addresses with a load.

Table 22. **store_unit module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush	CONTROLLER	logic
`no_st_pending_o`	out	No store pending	COMMIT_STAGE	logic
`store_buffer_empty_o`	out	Store buffer is empty	LOAD_UNIT	logic
`valid_i`	in	Store instruction is valid	ISSUE_STAGE	logic
`lsu_ctrl_i`	in	Data input	ISSUE_STAGE	lsu_ctrl_t
`pop_st_o`	out	Pop store	LSU_BYPASS	logic
`commit_i`	in	Instruction commit	COMMIT_STAGE	logic
`commit_ready_o`	out	Commit queue is ready to accept another commit request	COMMIT_STAGE	logic
`valid_o`	out	Store result is valid	ISSUE_STAGE	logic
`trans_id_o`	out	Transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`result_o`	out	Store result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`ex_o`	out	Store exception output	ISSUE_STAGE	exception_t
`translation_req_o`	out	Address translation request	MMU/PMP	logic
`vaddr_o`	out	Virtual address	MMU/PMP	logic[CVA6Cfg.VLEN-1:0]
`paddr_i`	in	Physical address	MMU/PMP	logic[CVA6Cfg.PLEN-1:0]
`ex_i`	in	Exception raised before store	MMU/PMP	exception_t
`page_offset_i`	in	Address to be checked	load_unit	logic[11:0]
`page_offset_matches_o`	out	Address check result	load_unit	logic
`obi_store_req_o`	out	Store cache response	DCACHE	obi_store_req_t
`obi_store_rsp_i`	in	Store cache request	DCACHE	obi_store_rsp_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As EnableAccelerator = 0,

stall_st_pending_i input is tied to 0

As RVA = False,

amo_valid_commit_i input is tied to 0
obi_amo_req_o output is tied to 0
obi_amo_rsp_i input is tied to 0

As IsRVFI = 0,

rvfi_mem_paddr_o output is tied to 0

As RVH = False,

tinst_o output is tied to 0
hs_ld_st_inst_o output is tied to 0
hlvx_inst_o output is tied to 0

For any HW configuration,

dtlb_hit_i input is tied to 1

4.4.3.5.2. load_unit

The load unit module manages the data load operations.

Before issuing a load, the load unit needs to check the store buffer for potential aliasing. It stalls until it can satisfy the current request. This means:

Two loads to the same address are allowed.
Two stores to the same address are allowed.
A store after a load to the same address is allowed.
A load after a store to the same address can only be processed if the store has already been sent to the cache i.e there is no fowarding.

After the check of the store buffer, a read request is sent to the D$ with the index field of the address (1). The load unit stalls until the D$ acknowledges this request (2). In the next cycle, the tag field of the address is sent to the D$ (3). If the load request address is non-idempotent, it stalls until the write buffer of the D$ is empty of non-idempotent requests and the store buffer is empty. It also stalls until the incoming load instruction is the next instruction to be committed. When the D$ allows the read of the data, the data is sent to the load unit and the load instruction can be committed (4).

Load unit’s interactions

Table 23. **load_unit module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush signal	CONTROLLER	logic
`valid_i`	in	Load request is valid	LSU_BYPASS	logic
`lsu_ctrl_i`	in	Load request input	LSU_BYPASS	lsu_ctrl_t
`pop_ld_o`	out	Pop the load request from the LSU bypass FIFO	LSU_BYPASS	logic
`valid_o`	out	Load unit result is valid	ISSUE_STAGE	logic
`trans_id_o`	out	Load transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`result_o`	out	Load result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`ex_o`	out	Load exception	ISSUE_STAGE	exception_t
`translation_req_o`	out	Request address translation	MMU/PMP	logic
`vaddr_o`	out	Virtual address	MMU/PMP	logic[CVA6Cfg.VLEN-1:0]
`paddr_i`	in	Physical address	MMU/PMP	logic[CVA6Cfg.PLEN-1:0]
`ex_i`	in	Excepted which appears before load	MMU/PMP	exception_t
`page_offset_o`	out	Page offset for address checking	STORE_UNIT	logic[11:0]
`page_offset_matches_i`	in	Indicates if the page offset matches a store unit entry	STORE_UNIT	logic
`store_buffer_empty_i`	in	Store buffer is empty	STORE_UNIT	logic
`commit_tran_id_i`	in	Transaction ID of the committing instruction	COMMIT_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`obi_load_req_o`	out	Load cache response	DCACHE	obi_load_req_t
`obi_load_rsp_i`	in	Load cache request	DCACHE	obi_load_rsp_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVH = False,

tinst_o output is tied to 0
hs_ld_st_inst_o output is tied to 0
hlvx_inst_o output is tied to 0

For any HW configuration,

dtlb_hit_i input is tied to 1

As MMUPresent = 0,

dtlb_ppn_i input is tied to 0

As PipelineOnly = True,

load_req_o output is tied to 0
load_rsp_i input is tied to 0
dcache_wbuffer_not_ni_i input is tied to 0

4.4.3.5.3. lsu_bypass

The LSU bypass is a FIFO which keeps instructions from the issue stage when the store unit or the load unit are not available immediately.

Table 24. **lsu_bypass module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`flush_i`	in	Flush	CONTROLLER	logic
`lsu_req_i`	in	Load store unit request	LSU_UNIT	lsu_ctrl_t
`lsu_req_valid_i`	in	load store unit valid	LSU_UNIT	logic
`pop_ld_i`	in	Pop load	LOAD_UNIT	logic
`pop_st_i`	in	Pop store	STORE_UNIT	logic
`ready_o`	out	Load Store Unit is ready	ISSUE_STAGE	logic

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As IsRVFI = 0,

lsu_ctrl_o output is tied to 0

4.4.3.6. CVXIF_fu

Table 25. **cvxif_fu module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`x_valid_i`	in	CVXIF instruction is valid	ISSUE_STAGE	logic
`x_trans_id_i`	in	Transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`x_illegal_i`	in	Instruction is illegal, determined during CVXIF issue transaction	ISSUE_STAGE	logic
`x_off_instr_i`	in	Offloaded instruction	ISSUE_STAGE	logic[31:0]
`x_ready_o`	out	CVXIF is ready	ISSUE_STAGE	logic
`x_trans_id_o`	out	CVXIF result transaction ID	ISSUE_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`x_exception_o`	out	CVXIF exception	ISSUE_STAGE	exception_t
`x_result_o`	out	CVXIF FU result	ISSUE_STAGE	logic[CVA6Cfg.XLEN-1:0]
`x_valid_o`	out	CVXIF result valid	ISSUE_STAGE	logic
`x_we_o`	out	CVXIF write enable	ISSUE_STAGE	logic
`x_rd_o`	out	CVXIF destination register	ISSUE_STAGE	logic[4:0]
`result_valid_i`	in	none	none	logic
`result_i`	in	none	none	x_result_t
`result_ready_o`	out	none	none	logic

4.5. COMMIT_STAGE Module

4.5.1. Description

The COMMIT_STAGE module implements the commit stage, which is the last stage in the processor’s pipeline. For the instructions for which the execution is completed, it updates the architectural state: writing CSR registers, committing stores and writing back data to the register file. The commit stage controls the stalling and the flushing of the processor.

The commit stage also manages the exceptions. An exception can occur during the first four pipeline stages (PCgen cannot generate an exception) or happen in commit stage, coming from the CSR_REGFILE or from an interrupt. Exceptions are precise: they are considered during the commit only and associated with the related instruction.

Table 26. **commit_stage module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`halt_i`	in	Request to halt the core	CONTROLLER	logic
`flush_dcache_i`	in	request to flush dcache, also flush the pipeline	CACHE	logic
`exception_o`	out	Exception raised by all sources	EX_STAGE	exception_t
`commit_instr_i`	in	The instruction we want to commit	ISSUE_STAGE	scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0]
`commit_drop_i`	in	The instruction is cancelled	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`commit_ack_o`	out	Acknowledge that we are indeed committing	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`commit_macro_ack_o`	out	Acknowledge that we are indeed committing	CSR_REGFILE	logic[CVA6Cfg.NrCommitPorts-1:0]
`waddr_o`	out	Register file write address	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][4:0]
`wdata_o`	out	Register file write data	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0]
`we_gpr_o`	out	Register file write enable	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`we_fpr_o`	out	Floating point register enable	ISSUE_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`pc_o`	out	Program counter	FRONTEND_CSR_REGFILE	logic[CVA6Cfg.VLEN-1:0]
`csr_op_o`	out	Decoded CSR operation	CSR_REGFILE	fu_op
`csr_wdata_o`	out	Data to write to CSR	CSR_REGFILE	logic[CVA6Cfg.XLEN-1:0]
`csr_rdata_i`	in	Data to read from CSR	CSR_REGFILE	logic[CVA6Cfg.XLEN-1:0]
`csr_exception_i`	in	Exception or interrupt occurred in CSR stage (the same as commit)	CSR_REGFILE	exception_t
`commit_lsu_o`	out	Commit the pending store	EX_STAGE	logic
`commit_lsu_ready_i`	in	Commit buffer of LSU is ready	EX_STAGE	logic
`commit_tran_id_o`	out	Transaction id of first commit port	ID_STAGE	logic[CVA6Cfg.TRANS_ID_BITS-1:0]
`no_st_pending_i`	in	no store is pending	EX_STAGE	logic
`commit_csr_o`	out	Commit the pending CSR instruction	EX_STAGE	logic
`flush_commit_o`	out	Request a pipeline flush	CONTROLLER	logic

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVF = 0,

dirty_fp_state_o output is tied to 0
csr_write_fflags_o output is tied to 0

As DebugEn = False,

single_step_i input is tied to 0

As RVA = False,

obi_amo_rsp_i input is tied to 0
amo_valid_commit_o output is tied to 0

As FenceEn = 0,

fence_i_o output is tied to 0
fence_o output is tied to 0

As RVS = False,

sfence_vma_o output is tied to 0

As RVH = False,

hfence_vvma_o output is tied to 0
hfence_gvma_o output is tied to 0

4.5.2. Functionality

4.6. CONTROLLER Module

4.6.1. Description

Table 27. **controller module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`set_pc_commit_o`	out	Set PC om PC Gen	FRONTEND	logic
`flush_if_o`	out	Flush the IF stage	FRONTEND	logic
`flush_unissued_instr_o`	out	Flush un-issued instructions of the scoreboard	FRONTEND	logic
`flush_id_o`	out	Flush ID stage	ID_STAGE	logic
`flush_ex_o`	out	Flush EX stage	EX_STAGE	logic
`flush_bp_o`	out	Flush branch predictors	FRONTEND	logic
`flush_icache_o`	out	Flush ICache	CACHE	logic
`flush_dcache_o`	out	Flush DCache	CACHE	logic
`flush_dcache_ack_i`	in	Acknowledge the whole DCache Flush	CACHE	logic
`halt_csr_i`	in	Halt request from CSR (WFI instruction)	CSR_REGFILE	logic
`halt_o`	out	Halt signal to commit stage	COMMIT_STAGE	logic
`eret_i`	in	Return from exception	CSR_REGFILE	logic
`ex_valid_i`	in	We got an exception, flush the pipeline	FRONTEND	logic
`resolved_branch_i`	in	We got a resolved branch, check if we need to flush the front-end	EX_STAGE	bp_resolve_t
`flush_csr_i`	in	We got an instruction which altered the CSR, flush the pipeline	CSR_REGFILE	logic
`flush_commit_i`	in	Flush request from commit stage	COMMIT_STAGE	logic

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVH = False,

v_i input is tied to 0
flush_tlb_vvma_o output is tied to 0
flush_tlb_gvma_o output is tied to 0
hfence_vvma_i input is tied to 0
hfence_gvma_i input is tied to 0

As MMUPresent = 0,

flush_tlb_o output is tied to 0

As EnableAccelerator = 0,

halt_acc_i input is tied to 0
flush_acc_i input is tied to 0

As DebugEn = False,

set_debug_pc_i input is tied to 0

As FenceEn = 0,

fence_i_i input is tied to 0
fence_i input is tied to 0

As RVS = False,

sfence_vma_i input is tied to 0

4.6.2. Functionality

4.7. CSR_REGFILE Module

4.7.1. Description

Table 28. **csr_regfile module** IO ports
Signal	IO	Description	connexion	Type
`clk_i`	in	Subsystem Clock	SUBSYSTEM	logic
`rst_ni`	in	Asynchronous reset active low	SUBSYSTEM	logic
`time_irq_i`	in	Timer threw a interrupt	SUBSYSTEM	logic
`flush_o`	out	send a flush request out when a CSR with a side effect changes	CONTROLLER	logic
`halt_csr_o`	out	halt requested	CONTROLLER	logic
`commit_instr_i`	in	Instruction to be committed	ID_STAGE	scoreboard_entry_t
`commit_ack_i`	in	Commit acknowledged a instruction → increase instret CSR	COMMIT_STAGE	logic[CVA6Cfg.NrCommitPorts-1:0]
`boot_addr_i`	in	Address from which to start booting, mtvec is set to the same address	SUBSYSTEM	logic[CVA6Cfg.VLEN-1:0]
`hart_id_i`	in	Hart id in a multicore environment (reflected in a CSR)	SUBSYSTEM	logic[CVA6Cfg.XLEN-1:0]
`ex_i`	in	We’ve got an exception from the commit stage, take it	COMMIT_STAGE	exception_t
`csr_op_i`	in	Operation to perform on the CSR file	COMMIT_STAGE	fu_op
`csr_addr_i`	in	Address of the register to read/write	EX_STAGE	logic[11:0]
`csr_wdata_i`	in	Write data in	COMMIT_STAGE	logic[CVA6Cfg.XLEN-1:0]
`csr_rdata_o`	out	Read data out	COMMIT_STAGE	logic[CVA6Cfg.XLEN-1:0]
`pc_i`	in	PC of instruction accessing the CSR	COMMIT_STAGE	logic[CVA6Cfg.VLEN-1:0]
`csr_exception_o`	out	attempts to access a CSR without appropriate privilege	COMMIT_STAGE	exception_t
`epc_o`	out	Output the exception PC to PC Gen, the correct CSR (mepc, sepc) is set accordingly	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`eret_o`	out	Return from exception, set the PC of epc_o	FRONTEND	logic
`trap_vector_base_o`	out	Output base of exception vector, correct CSR is output (mtvec, stvec)	FRONTEND	logic[CVA6Cfg.VLEN-1:0]
`irq_ctrl_o`	out	interrupt management to id stage	ID_STAGE	irq_ctrl_t
`irq_i`	in	external interrupt in	SUBSYSTEM	logic[1:0]
`ipi_i`	in	inter processor interrupt → connected to machine mode sw	SUBSYSTEM	logic
`icache_en_o`	out	L1 ICache Enable	CACHE	logic
`dcache_en_o`	out	L1 DCache Enable	CACHE	logic
`rvfi_csr_o`	out	none	none	rvfi_probes_csr_t
`jvt_o`	out	none	none	jvt_t

Due to CV32A65X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below

As RVF = 0,

dirty_fp_state_i input is tied to 0
csr_write_fflags_i input is tied to 0
fs_o output is tied to 0
fflags_o output is tied to 0
frm_o output is tied to 0
fprec_o output is tied to 0

As EnableAccelerator = 0,

dirty_v_state_i input is tied to 0
acc_fflags_ex_i input is tied to 0
acc_fflags_ex_valid_i input is tied to 0
acc_cons_en_o output is tied to 0
pmpcfg_o output is tied to 0
pmpaddr_o output is tied to 0

As PRIV = MachineOnly,

priv_lvl_o output is tied to MachineMode
ld_st_priv_lvl_o output is tied to MAchineMode
tvm_o output is tied to 0
tw_o output is tied to 0
tsr_o output is tied to 0

As RVH = False,

v_o output is tied to 0
vfs_o output is tied to 0
en_g_translation_o output is tied to 0
en_ld_st_g_translation_o output is tied to 0
ld_st_v_o output is tied to 0
csr_hs_ld_st_inst_i input is tied to 0
vs_sum_o output is tied to 0
vmxr_o output is tied to 0
vsatp_ppn_o output is tied to 0
vs_asid_o output is tied to 0
hgatp_ppn_o output is tied to 0
vmid_o output is tied to 0
vtw_o output is tied to 0
hu_o output is tied to 0

As RVV = False,

vs_o output is tied to 0

As RVS = False,

en_translation_o output is tied to 0
en_ld_st_translation_o output is tied to 0
sum_o output is tied to 0
mxr_o output is tied to 0
satp_ppn_o output is tied to 0
asid_o output is tied to 0

As DebugEn = False,

debug_req_i input is tied to 0
set_debug_pc_o output is tied to 0
debug_mode_o output is tied to 0
single_step_o output is tied to 0

As PerfCounterEn = 0,

perf_addr_o output is tied to 0
perf_data_o output is tied to 0
perf_data_i input is tied to 0
perf_we_o output is tied to 0
mcountinhibit_o output is tied to 0

4.7.2. Functionality

5. Glossary

ALU: Arithmetic/Logic Unit
APU: Application Processing Unit
ASIC: Application-Specific Integrated Circuit
AXI: Advanced eXtensible Interface
BHT: Branch History Table
BTB: Branch Target Buffer
Byte: 8-bit data item
CPU: Central Processing Unit, processor
CSR: Control and Status Register
Custom extension: Non-Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA)
CVA6: Core-V Application class processor with a 6 stage pipeline
D$: Data Cache
DPI: Direct Programming Interface
EX or EXE: Instruction Execute
FPGA: Field Programmable Gate Array
FPU: Floating Point Unit
Halfword: 16-bit data item
Halfword aligned address: An address is halfword aligned if it is divisible by 2
I$: Instruction Cache
ID: Instruction Decode
IF: Instruction Fetch
ISA: Instruction Set Architecture
KGE: Kilo Gate Equivalents (NAND2)
LSU: Load Store Unit
M-Mode: Machine Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture)
MMU: Memory Management Unit
NC: Not Cacheable
OBI: Open Bus Interface
OoO: Out Of Order
PC: Program Counter
PMP: Physical memory protection (RISC-V Instruction Set Manual, Volume II: Privileged Architecture)
PTW: Page Table Walker
PULP platform: Parallel Ultra Low Power Platform (https://pulp-platform.org)
RAS: Return Address Stack
RV32C: RISC-V Compressed (C extension)
RV32F: RISC-V Floating Point (F extension)
S-Mode: Supervisor Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture)
SIMD: Single Instruction/Multiple Data
Standard extension: Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA)
TLB: Translation Lookaside Buffer
U-Mode: User Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture)
VLEN: Virtual address length
WARL: Write Any Values, Reads Legal Values
WB: Write Back of instruction results
WLRL: Write/Read Only Legal Values
Word: 32-bit data item
Word aligned address: An address is word aligned if it is divisible by 4
WPRI: Reserved Writes Preserve Values, Reads Ignore Values
XLEN: RISC-V processor data length