CV32A60X DESIGN DOCUMENT
Design Documentation for CV32A60X architecture
- 1. Introduction
- 2. Subsystem
- 3. Functionality
- 3.1. Instructions
- 3.2. isa
- 3.2.1. Instructions
- 3.2.2. RV32I Base Integer Instructions
- 3.2.3. RV32M Multiplication and Division Instructions
- 3.2.4. RV32C Compressed Instructions
- 3.2.5. RV32Zicsr Control and Status Register Instructions
- 3.2.6. RV32Zcb Code Size Reduction Instructions
- 3.2.7. RVZba Address generation instructions
- 3.2.8. RVZbb Basic bit-manipulation
- 3.2.9. RVZbc Carry-less multiplication
- 3.2.10. RVZbs Single bit Instructions
 
- 3.3. Traps, Interrupts, Exceptions
- 3.4. csr
- 3.4.1. Conventions
- 3.4.2. Register Summary
- 3.4.3. Register Description
- 3.4.3.1. MSTATUS
- 3.4.3.2. MISA
- 3.4.3.3. MIE
- 3.4.3.4. MTVEC
- 3.4.3.5. MSTATUSH
- 3.4.3.6. MHPMEVENT[3-31]
- 3.4.3.7. MSCRATCH
- 3.4.3.8. MEPC
- 3.4.3.9. MCAUSE
- 3.4.3.10. MTVAL
- 3.4.3.11. MIP
- 3.4.3.12. PMPCFG[0-15]
- 3.4.3.13. PMPADDR[0-63]
- 3.4.3.14. ICACHE
- 3.4.3.15. DCACHE
- 3.4.3.16. MCYCLE
- 3.4.3.17. MINSTRET
- 3.4.3.18. MHPMCOUNTER[3-31]
- 3.4.3.19. MCYCLEH
- 3.4.3.20. MINSTRETH
- 3.4.3.21. MHPMCOUNTER[3-31]H
- 3.4.3.22. MVENDORID
- 3.4.3.23. MARCHID
- 3.4.3.24. MIMPID
- 3.4.3.25. MHARTID
- 3.4.3.26. MCONFIGPTR
 
 
- 3.5. OBI Bus interface
- 3.6. CV-X-IF Interface and Coprocessor
 
- 4. Architecture and Modules
- 5. Glossary
Editor: Jean Roch Coulon
1. Introduction
The OpenHW Group uses semantic versioning to describe the release status of its IP. This document describes the CV32A60X configuration version of CVA6. This intends to be the first formal release of CVA6.
CVA6 is a 6-stage in-order and single issue processor core which implements the RISC-V instruction set. CVA6 can be configured as a 32- or 64-bit core (RV32 or RV64), called CV32A6 or CV64A6.
The objective of this document is to provide enough information to allow the RTL modification (by designers) and the RTL verification (by verificators). This document is not dedicated to CVA6 users looking for information to develop software like instructions or registers.
The CVA6 architecture is illustrated in the following figure.
1.1. License
Copyright 2022 Thales Copyright 2018 ETH Zürich and University of Bologna SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 Licensed under the Solderpad Hardware License v 2.1 (the “License”); you may not use this file except in compliance with the License, or, at your option, the Apache License version 2.0. You may obtain a copy of the License at https://solderpad.org/licenses/SHL-2.1/. Unless required by applicable law or agreed to in writing, any work distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
1.2. Standards Compliance
To ease the reading, the reference to these specifications can be implicit in the requirements below. For the sake of precision, the requirements identify the versions of RISC-V extensions from these specifications.
- 
[CVA6req] “CVA6 requirement specification”, https://github.com/openhwgroup/cva6/blob/master/docs/specifications/cva6_requirement_specification.rst, HASH#767c465. 
- 
[RVunpriv] “The RISC-V Instruction Set Manual for CV32A60X, Volume I: User-Level ISA”, https://docs.openhwgroup.org/projects/cva6-user-manual/07_cv32a60x/riscv/unpriv.html. 
- 
[RVpriv] “The RISC-V Instruction Set Manual for CV32A60X, Volume II: Privileged Architecture”, https://docs.openhwgroup.org/projects/cva6-user-manual/07_cv32a60x/riscv/priv.html. 
- 
[RVcompat] “RISC-V Architectural Compatibility Test Framework”, https://github.com/riscv-non-isa/riscv-arch-test. 
- 
[OBI] OBI Specification v1.6.0, https://github.com/openhwgroup/obi/blob/072d9173c1f2d79471d6f2a10eae59ee387d4c6f/OBI-v1.6.0.pdf 
- 
[CV-X-IF] Core-V eXtension interface (CV-X-IF) v1.0.0, https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en/v1.0.0/. 
CV32A60X is a standards-compliant 32-bit processor fully compliant with above specifications. In the meanwhile, the conformity of CV32A60X versus [RVcompat] has not been tested.
1.3. Documentation framework
The framework of this document is inspired by the Common Criteria. The Common Criteria for Information Technology Security Evaluation (referred to as Common Criteria or CC) is an international standard (ISO/IEC 15408) for computer security certification.
Description of the framework:
- 
Processor is split into modules corresponding to the main modules of the design 
- 
Modules can contain several modules 
- 
Each module is described in a chapter, which contains the following subchapters: Description, Functionalities, Architecture and Modules and Registers (if any) 
- 
The subchapter Description describes the main features of the submodule, the interconnections between the current module and the others and the inputs/outputs interface. 
- 
The subchapter Functionality lists in details the module functionalities. Please avoid using the RTL signal names to explain the functionalities. 
- 
The subchapter Architecture and Modules provides a drawing to present the module hierarchy, then the functionalities covered by the module 
- 
The subchapter Registers specifies the module registers if any 
1.4. Contributors
Jean-Roch Coulon - Thales André Sintzoff - Thales Yannick Casamatta - Thales Guillaume Chauvon - Thales Ayoub Jalali ([email protected]) Alae Eddine Ezzejjari ([email protected])
2. Subsystem
2.1. Global functionality
The CV32A60X is a subsystem composed of the modules and protocol interfaces as illustrated. The processor is a Harvard-based modern architecture. Instructions are issued in-order through the DECODE stage and executed out-of-order but committed in-order. The processor is Single issue, that means that at maximum one instruction per cycle can be issued to the EXECUTE stage.
The CVA6 implements a 6-stage pipeline composed of PC Generation, Instruction Fetch, Instruction Decode, Issue stage, Execute stage and Commit stage. At least 6 cycles are needed to execute one instruction.
2.2. Connection with other sub-systems
The submodule is connected to:
- 
OBI interconnect gives access to memory content 
- 
COPROCESSOR connects through CV-X-IF coprocessor interface protocol 
- 
TRAP provides traps inputs 
2.3. Parameter configuration
| Name | description | description | 
|---|---|---|
| XLEN | General Purpose Register Size (in bits) | 32 | 
| VLEN | Virtual address Size (in bits) | 32 | 
| RVA | Atomic RISC-V extension | False | 
| RVB | Bit manipulation RISC-V extension | True | 
| ZKN | Scalar Cryptography RISC-V entension | False | 
| RVV | Vector RISC-V extension | False | 
| RVC | Compress RISC-V extension | True | 
| RVH | Hypervisor RISC-V extension | False | 
| RVZCB | Zcb RISC-V extension | True | 
| RVZCMP | Zcmp RISC-V extension | False | 
| RVZCMT | Zcmt RISC-V extension | False | 
| RVZiCond | Zicond RISC-V extension | False | 
| RVZicntr | Zicntr RISC-V extension | False | 
| RVZifencei | Zifencei RISC-V extension | False | 
| RVZihpm | Zihpm RISC-V extension | False | 
| RVF | Floating Point | False | 
| RVD | Floating Point | False | 
| XF16 | Non standard 16bits Floating Point extension | False | 
| XF16ALT | Non standard 16bits Floating Point Alt extension | False | 
| XF8 | Non standard 8bits Floating Point extension | False | 
| XFVec | Non standard Vector Floating Point extension | False | 
| PerfCounterEn | Perf counters | False | 
| MmuPresent | MMU | False | 
| RVS | Supervisor mode | False | 
| RVU | User mode | False | 
| SoftwareInterruptEn | Software interrupts are enabled | False | 
| DebugEn | Debug support | False | 
| DmBaseAddress | Base address of the debug module | 0x0 | 
| HaltAddress | Address to jump when halt request | 0x800 | 
| ExceptionAddress | Address to jump when exception | 0x808 | 
| TvalEn | Tval Support Enable | False | 
| DirectVecOnly | MTVEC CSR supports only direct mode | True | 
| NrPMPEntries | PMP entries number | 0 | 
| PMPCfgRstVal | PMP CSR configuration reset values | [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0] | 
| PMPAddrRstVal | PMP CSR address reset values | [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0] | 
| PMPEntryReadOnly | PMP CSR read-only bits | 0 | 
| PMPNapotEn | PMP NA4 and NAPOT mode enable | False | 
| NrNonIdempotentRules | PMA non idempotent rules number | 0 | 
| NonIdempotentAddrBase | PMA NonIdempotent region base address | [0b0, 0b0] | 
| NonIdempotentLength | PMA NonIdempotent region length | [0b0, 0b0] | 
| NrExecuteRegionRules | PMA regions with execute rules number | 0 | 
| ExecuteRegionAddrBase | PMA Execute region base address | [0x80000000, 0x10000, 0x0] | 
| ExecuteRegionLength | PMA Execute region address base | [0x40000000, 0x10000, 0x1000] | 
| NrCachedRegionRules | PMA regions with cache rules number | 1 | 
| CachedRegionAddrBase | PMA cache region base address | [0x80000000] | 
| CachedRegionLength | PMA cache region rules | [0x40000000] | 
| CvxifEn | CV-X-IF coprocessor interface enable | True | 
| CoproType | Coprocessor type | config_pkg::COPRO_EXAMPLE | 
| NOCType | NOC bus type | config_pkg::NOC_TYPE_AXI4_ATOP | 
| AxiAddrWidth | AXI address width | 64 | 
| AxiDataWidth | AXI data width | 64 | 
| AxiIdWidth | AXI ID width | 4 | 
| AxiUserWidth | AXI User width | 32 | 
| AxiBurstWriteEn | AXI burst in write | False | 
| MemTidWidth | Transaction ID | 4 | 
| IcacheByteSize | Instruction cache size (in bytes) | 2048 | 
| IcacheSetAssoc | Instruction cache associativity (number of ways) | 2 | 
| IcacheLineWidth | Instruction cache line width | 128 | 
| DCacheType | Cache Type | config_pkg::HPDCACHE_WT | 
| DcacheIdWidth | Data cache ID | 1 | 
| DcacheByteSize | Data cache size (in bytes) | 2028 | 
| DcacheSetAssoc | Data cache associativity (number of ways) | 2 | 
| DcacheLineWidth | Data cache line width | 128 | 
| DcacheFlushOnFence | Data cache flush on fence | False | 
| DcacheInvalidateOnFlush | Data cache invalidate on flush | False | 
| DataUserEn | User field on data bus enable | 1 | 
| WtDcacheWbufDepth | Write-through data cache write buffer depth | 8 | 
| FetchUserEn | User field on fetch bus enable | 1 | 
| FetchUserWidth | Width of fetch user field | 32 | 
| FpgaEn | Is FPGA optimization of CV32A6 for Xilinx and Altera | False | 
| FpgaAlteraEn | Is FPGA optimization for Altera FPGA | False | 
| TechnoCut | Is Techno Cut instanciated | True | 
| SuperscalarEn | Enable superscalar* with 2 issue ports and 2 commit ports. | False | 
| NrCommitPorts | Number of commit ports. Forced to 2 if SuperscalarEn. | 1 | 
| NrLoadPipeRegs | Load cycle latency number | 0 | 
| NrStorePipeRegs | Store cycle latency number | 0 | 
| NrScoreboardEntries | Scoreboard length | 4 | 
| NrLoadBufEntries | Load buffer entry buffer | 2 | 
| MaxOutstandingStores | Maximum number of outstanding stores | 7 | 
| RASDepth | Return address stack depth | 2 | 
| BTBEntries | Branch target buffer entries | 0 | 
| BHTEntries | Branch history entries | 32 | 
| InstrTlbEntries | MMU instruction TLB entries | 2 | 
| DataTlbEntries | MMU data TLB entries | 2 | 
| UseSharedTlb | MMU option to use shared TLB | True | 
| SharedTlbDepth | MMU depth of shared TLB | 64 | 
| ObiVersion | OBI version compliance, 0 means not compliant → best performance | config_pkg::OBI_V1_6 | 
| PipelineOnly | Configuration defines cva6_pipeline module as top instead of cva6 (no cache and OBI instead of AXI) | True | 
2.4. IO ports
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Reset boot address | SUBSYSTEM | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Hard ID reflected as CSR | SUBSYSTEM | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | Level sensitive (async) interrupts | SUBSYSTEM | logic[1:0] | 
| 
 | in | Inter-processor (async) interrupt | SUBSYSTEM | logic | 
| 
 | in | Timer (async) interrupt | SUBSYSTEM | logic | 
| 
 | out | CVXIF request | SUBSYSTEM | cvxif_req_t | 
| 
 | in | CVXIF response | SUBSYSTEM | cvxif_resp_t | 
| 
 | out | Fetch OBI request ports | FRONTEND | obi_fetch_req_t | 
| 
 | in | Fetch OBI response ports | FRONTEND | obi_fetch_rsp_t | 
| 
 | out | Store OBI request ports | EX_STAGE | obi_store_req_t | 
| 
 | in | Store OBI response ports | EX_STAGE | obi_store_rsp_t | 
| 
 | out | Load OBI request ports | EX_STAGE | obi_load_req_t | 
| 
 | in | Load OBI response ports | EX_STAGE | obi_load_rsp_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As DebugEn = False,
- 
- 
debug_req_iinput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_probes_ooutput is tied to 0
 
- 
- As PipelineOnly = True,
- 
- 
icache_enable_ooutput is tied to 0
- 
icache_flush_ooutput is tied to 0
- 
dcache_enable_ooutput is tied to 0
- 
dcache_flush_ooutput is tied to 0
- 
dcache_flush_ack_iinput is tied to 0
- 
fetch_req_ooutput is tied to 0
- 
fetch_rsp_iinput is tied to 0
- 
load_req_ooutput is tied to 0
- 
load_rsp_iinput is tied to 0
- 
dcache_wbuffer_empty_iinput is tied to 0
- 
dcache_wbuffer_not_ni_iinput is tied to 0
 
- 
- As PerfCounterEn = 0,
- 
- 
icache_miss_iinput is tied to 0
- 
dcache_miss_iinput is tied to 0
 
- 
- As RVA = False,
- 
- 
obi_amo_req_ooutput is tied to 0
- 
obi_amo_rsp_iinput is tied to 0
 
- 
- As MMUPresent = 0,
- 
- 
obi_mmu_ptw_req_ooutput is tied to 0
- 
obi_mmu_ptw_rsp_iinput is tied to 0
 
- 
- As RVZCMT = False,
- 
- 
obi_zcmt_req_ooutput is tied to 0
- 
obi_zcmt_rsp_iinput is tied to 0
 
- 
3. Functionality
3.1. Instructions
The next subchapter lists the extensions implemented in CV32A60X. By configuration, we can enable/disable the extensions. CV32A60X supports the extensions described in the next subchapters.
3.2. isa
3.2.1. Instructions
| Subset Name | Name | Description | 
|---|---|---|
| I | RV32I Base Integer Instructions | The base integer instruction set, also known as the 'RV32I' or 'RV64I' instruction set, depending on the address space size, provides the core functionality required for general-purpose computing. It includes instructions for arithmetic, logical, and control operations, as well as memory access and manipulation | 
| M | RV32M Multiplication and Division Instructions | The standard integer multiplication and division instruction extension, which is named “M” and contains instructions that multiply or divide values held in two integer registers. | 
| C | RV32C Compressed Instructions | RVC uses a simple compression scheme that offers shorter 16-bit versions of common 32-bit RISC-V instructions when: the immediate or address offset is small; one of the registers is the zero register (x0), the ABI link register (x1), or the ABI stack pointer (x2); the destination register and the first source register are identical; the registers used are the 8 most popular ones. The C extension is compatible with all other standard instruction extensions. The C extension allows 16-bit instructions to be freely intermixed with 32-bit instructions, with the latter now able to start on any 16-bit boundary. With the addition of the C extension, JAL and JALR instructions will no longer raise an instruction misaligned exception | 
| Zicsr | RV32Zicsr Control and Status Register Instructions | All CSR instructions atomically read-modify-write a single CSR, whose CSR specifier is encoded in the 12-bit csr field of the instruction held in bits 31–20. The immediate forms use a 5-bit zero-extended immediate encoded in the rs1 field. | 
| Zcb | RV32Zcb Code Size Reduction Instructions | Zcb belongs to the group of extensions called RISC-V Code Size Reduction Extension (Zc*). Zc* has become the superset of the Standard C extension adding more 16-bit instructions to the ISA. Zcb includes the 16-bit version of additional Integer (I), Multiply (M), and Bit-Manipulation (Zbb) Instructions. All the Zcb instructions require at least standard C extension support as a prerequisite, along with M and Zbb extensions for the 16-bit version of the respective instructions. | 
| Zba | RVZba Address generation instructions | The Zba instructions can be used to accelerate the generation of addresses that index into arrays of basic types (halfword, word, doubleword) using both unsigned word-sized and XLEN-sized indices: a shifted index is added to a base address. The shift and add instructions do a left shift of 1, 2, or 3 because these are commonly found in real-world code and because they can be implemented with a minimal amount of additional hardware beyond that of the simple adder. This avoids lengthening the critical path in implementations. While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw added in this extension can be used when the index is to be interpreted as an unsigned word. | 
| Zbb | RVZbb Basic bit-manipulation | The bit-manipulation (bitmanip) extension collection is comprised of several component extensions to the base RISC-V architecture that are intended to provide some combination of code size reduction, performance improvement, and energy reduction. While the instructions are intended to have general use, some instructions are more useful in some domains than others. Hence, several smaller bitmanip extensions are provided. Each of these smaller extensions is grouped by common function and use case, and each has its own Zb*-extension name. | 
| Zbc | RVZbc Carry-less multiplication | Carry-less multiplication is the multiplication in the polynomial ring over GF(2). clmul produces the lower half of the carry-less product and clmulh produces the upper half of the 2✕XLEN carry-less product. clmulr produces bits 2✕XLEN−2:XLEN-1 of the 2✕XLEN carry-less product. | 
| Zbs | RVZbs Single bit Instructions | The single-bit instructions provide a mechanism to set, clear, invert, or extract a single bit in a register. The bit is specified by its index. | 
| Xcvxif | Xcvxif | No info found yet for extension Xcvxif | 
3.2.2. RV32I Base Integer Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| ADDI | addi rd, rs1, imm[11:0] | x[rd] = x[rs1] + sext(imm[11:0]) | NONE | NONE | add sign-extended 12-bit immediate to register rs1, and store the result in register rd. | Integer_Register_Immediate_Operations | 
| ANDI | andi rd, rs1, imm[11:0] | x[rd] = x[rs1] & sext(imm[11:0]) | NONE | NONE | perform bitwise AND on register rs1 and the sign-extended 12-bit immediate and place the result in rd. | Integer_Register_Immediate_Operations | 
| ORI | ori rd, rs1, imm[11:0] | x[rd] = x[rs1] | sext(imm[11:0]) | NONE | NONE | perform bitwise OR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. | Integer_Register_Immediate_Operations | 
| XORI | xori rd, rs1, imm[11:0] | x[rd] = x[rs1] ^ sext(imm[11:0]) | NONE | NONE | perform bitwise XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd. | Integer_Register_Immediate_Operations | 
| SLTI | slti rd, rs1, imm[11:0] | if (x[rs1] < sext(imm[11:0])) x[rd] = 1 else x[rd] = 0 | NONE | NONE | set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as signed numbers, else 0 is written to rd. | Integer_Register_Immediate_Operations | 
| SLTIU | sltiu rd, rs1, imm[11:0] | if (x[rs1] <u sext(imm[11:0])) x[rd] = 1 else x[rd] = 0 | NONE | NONE | set register rd to 1 if register rs1 is less than the sign extended immediate when both are treated as unsigned numbers, else 0 is written to rd." | Integer_Register_Immediate_Operations | 
| SLLI | slli rd, rs1, imm[4:0] | x[rd] = x[rs1] << imm[4:0] | NONE | NONE | logical left shift (zeros are shifted into the lower bits). | Integer_Register_Immediate_Operations | 
| SRLI | srli rd, rs1, imm[4:0] | x[rd] = x[rs1] >> imm[4:0] | NONE | NONE | logical right shift (zeros are shifted into the upper bits). | Integer_Register_Immediate_Operations | 
| SRAI | srai rd, rs1, imm[4:0] | x[rd] = x[rs1] >>s imm[4:0] | NONE | NONE | arithmetic right shift (the original sign bit is copied into the vacated upper bits). | Integer_Register_Immediate_Operations | 
| LUI | lui rd, imm[19:0] | x[rd] = sext(imm[31:12] << 12) | NONE | NONE | place the immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. | Integer_Register_Immediate_Operations | 
| AUIPC | auipc rd, imm[19:0] | x[rd] = pc + sext(immediate[31:12] << 12) | NONE | NONE | form a 32-bit offset from the 20-bit immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then place the result in register rd. | Integer_Register_Immediate_Operations | 
| ADD | add rd, rs1, rs2 | x[rd] = x[rs1] + x[rs2] | NONE | NONE | add rs2 to register rs1, and store the result in register rd. | Integer_Register_Register_Operations | 
| SUB | sub rd, rs1, rs2 | x[rd] = x[rs1] - x[rs2] | NONE | NONE | subtract rs2 from register rs1, and store the result in register rd. | Integer_Register_Register_Operations | 
| AND | and rd, rs1, rs2 | x[rd] = x[rs1] & x[rs2] | NONE | NONE | perform bitwise AND on register rs1 and rs2 and place the result in rd. | Integer_Register_Register_Operations | 
| OR | or rd, rs1, rs2 | x[rd] = x[rs1] | x[rs2] | NONE | NONE | perform bitwise OR on register rs1 and rs2 and place the result in rd. | Integer_Register_Register_Operations | 
| XOR | xor rd, rs1, rs2 | x[rd] = x[rs1] ^ x[rs2] | NONE | NONE | perform bitwise XOR on register rs1 and rs2 and place the result in rd. | Integer_Register_Register_Operations | 
| SLT | slt rd, rs1, rs2 | if (x[rs1] < x[rs2]) x[rd] = 1 else x[rd] = 0 | NONE | NONE | set register rd to 1 if register rs1 is less than rs2 when both are treated as signed numbers, else 0 is written to rd. | Integer_Register_Register_Operations | 
| SLTU | sltu rd, rs1, rs2 | if (x[rs1] <u x[rs2]) x[rd] = 1 else x[rd] = 0 | NONE | NONE | set register rd to 1 if register rs1 is less than rs2 when both are treated as unsigned numbers, else 0 is written to rd. | Integer_Register_Register_Operations | 
| SLL | sll rd, rs1, rs2 | x[rd] = x[rs1] << x[rs2] | NONE | NONE | logical left shift (zeros are shifted into the lower bits). | Integer_Register_Register_Operations | 
| SRL | srl rd, rs1, rs2 | x[rd] = x[rs1] >> x[rs2] | NONE | NONE | logical right shift (zeros are shifted into the upper bits). | Integer_Register_Register_Operations | 
| SRA | sra rd, rs1, rs2 | x[rd] = x[rs1] >>s x[rs2] | NONE | NONE | arithmetic right shift (the original sign bit is copied into the vacated upper bits). | Integer_Register_Register_Operations | 
| JAL | jal rd, imm[20:1] | x[rd] = pc+4; pc += sext(imm[20:1]) | NONE | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | offset is sign-extended and added to the pc to form the jump target address (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd. | Control_Transfer_Operations-Unconditional_Jumps | 
| JALR | jalr rd, rs1, imm[11:0] | t = pc+4; pc = (x[rs1]+sext(imm[11:0]))&∼1 ; x[rd] = t | NONE | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | target address is obtained by adding the 12-bit signed immediate to the register rs1 (pc is calculated using signed arithmetic), then setting the least-significant bit of the result to zero, and store the address of instruction following the jump (pc+4) into register rd. | Control_Transfer_Operations-Unconditional_Jumps | 
| BEQ | beq rs1, rs2, imm[12:1] | if (x[rs1] == x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are equal. | Control_Transfer_Operations-Conditional_Branches | 
| BNE | bne rs1, rs2, imm[12:1] | if (x[rs1] != x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 and rs2 are not equal. | Control_Transfer_Operations-Conditional_Branches | 
| BLT | blt rs1, rs2, imm[12:1] | if (x[rs1] < x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using signed comparison). | Control_Transfer_Operations-Conditional_Branches | 
| BLTU | bltu rs1, rs2, imm[12:1] | if (x[rs1] <u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 less than rs2 (using unsigned comparison). | Control_Transfer_Operations-Conditional_Branches | 
| BGE | bge rs1, rs2, imm[12:1] | if (x[rs1] >= x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using signed comparison). | Control_Transfer_Operations-Conditional_Branches | 
| BGEU | bgeu rs1, rs2, imm[12:1] | if (x[rs1] >=u x[rs2]) pc += sext({imm[12:1], 1’b0}) else pc += 4 | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | takes the branch (pc is calculated using signed arithmetic) if registers rs1 is greater than or equal rs2 (using unsigned comparison). | Control_Transfer_Operations-Conditional_Branches | 
| LB | lb rd, imm(rs1) | x[rd] = sext(M[x[rs1] + sext(imm[11:0])][7:0]) | NONE | loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded. | loads a 8-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| LH | lh rd, imm(rs1) | x[rd] = sext(M[x[rs1] + sext(imm[11:0])][15:0]) | NONE | loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary). | loads a 16-bit value from memory, then sign-extends to 32-bit before storing in rd (rd is calculated using signed arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| LW | lw rd, imm(rs1) | x[rd] = sext(M[x[rs1] + sext(imm[11:0])][31:0]) | NONE | loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (4-byte boundary). | loads a 32-bit value from memory, then storing in rd (rd is calculated using signed arithmetic). The effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| LBU | lbu rd, imm(rs1) | x[rd] = zext(M[x[rs1] + sext(imm[11:0])][7:0]) | NONE | loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded. | loads a 8-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| LHU | lhu rd, imm(rs1) | x[rd] = zext(M[x[rs1] + sext(imm[11:0])][15:0]) | NONE | loads with a destination of x0 must still raise any exceptions and action any other side effects even though the load value is discarded, also an exception is raised if the memory address isn’t aligned (2-byte boundary). | loads a 16-bit value from memory, then zero-extends to 32-bit before storing in rd (rd is calculated using unsigned arithmetic), the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| SB | sb rs2, imm(rs1) | M[x[rs1] + sext(imm[11:0])][7:0] = x[rs2][7:0] | NONE | NONE | stores a 8-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| SH | sh rs2, imm(rs1) | M[x[rs1] + sext(imm[11:0])][15:0] = x[rs2][15:0] | NONE | an exception is raised if the memory address isn’t aligned (2-byte boundary). | stores a 16-bit value from the low bits of register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| SW | sw rs2, imm(rs1) | M[x[rs1] + sext(imm[11:0])][31:0] = x[rs2][31:0] | NONE | an exception is raised if the memory address isn’t aligned (4-byte boundary). | stores a 32-bit value from register rs2 to memory, the effective address is obtained by adding register rs1 to the sign-extended 12-bit offset. | Load_and_Store_Instructions | 
| FENCE | fence pre, succ | No operation (nop) | NONE | NONE | order device I/O and memory accesses as viewed by other RISC-V harts and external devices or coprocessors. Any combination of device input (I), device output (O), memory reads ®, and memory writes (W) may be ordered with respect to any combination of the same. Informally, no other RISC-V hart or external device can observe any operation in the successor set following a FENCE before any operation in the predecessor set preceding the FENCE, as the core support 1 hart, the fence instruction has no effect so we can considerate it as a nop instruction. | Memory_Ordering | 
| ECALL | ecall | RaiseException(EnvironmentCall) | NONE | Raise an Environment Call exception. | make a request to the supporting execution environment, which is usually an operating system. The ABI for the system will define how parameters for the environment request are passed, but usually these will be in defined locations in the integer register file. | Environment_Call_and_Breakpoints | 
| EBREAK | ebreak | RaiseException(Breakpoint) | NONE | Raise a Breakpoint exception. | cause control to be transferred back to a debugging environment. | Environment_Call_and_Breakpoints | 
3.2.3. RV32M Multiplication and Division Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| MUL | mul rd, rs1, rs2 | x[rd] = x[rs1] * x[rs2] | NONE | NONE | performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both rs1 and rs2 treated as signed numbers). | Multiplication Operations | 
| MULH | mulh rd, rs1, rs2 | x[rd] = (x[rs1] s*s x[rs2]) >>s 32 | NONE | NONE | performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as signed numbers). | Multiplication Operations | 
| MULHU | mulhu rd, rs1, rs2 | x[rd] = (x[rs1] u*u x[rs2]) >>u 32 | NONE | NONE | performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (Both rs1 and rs2 treated as unsigned numbers). | Multiplication Operations | 
| MULHSU | mulhsu rd, rs1, rs2 | x[rd] = (x[rs1] s*u x[rs2]) >>s 32 | NONE | NONE | performs a 32-bit × 32-bit multiplication and places the upper 32 bits in the destination register of the 64-bit product (rs1 treated as signed number, rs2 treated as unsigned number). | Multiplication Operations | 
| DIV | div rd, rs1, rs2 | x[rd] = x[rs1] /s x[rs2] | NONE | NONE | perform signed integer division of 32 bits by 32 bits (rounding towards zero). | Division Operations | 
| DIVU | divu rd, rs1, rs2 | x[rd] = x[rs1] /u x[rs2] | NONE | NONE | perform unsigned integer division of 32 bits by 32 bits (rounding towards zero). | Division Operations | 
| REM | rem rd, rs1, rs2 | x[rd] = x[rs1] %s x[rs2] | NONE | NONE | provide the remainder of the corresponding division operation DIV (the sign of rd equals the sign of rs1). | Division Operations | 
| REMU | rem rd, rs1, rs2 | x[rd] = x[rs1] %u x[rs2] | NONE | NONE | provide the remainder of the corresponding division operation DIVU. | Division Operations | 
3.2.4. RV32C Compressed Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| C.LI | c.li rd, imm[5:0] | x[rd] = sext(imm[5:0]) | rd = x0 | NONE | loads the sign-extended 6-bit immediate, imm, into register rd. | Integer Computational Instructions | 
| C.LUI | c.lui rd, nzimm[17:12] | x[rd] = sext(nzimm[17:12] << 12) | rd = x0 & rd = x2 & nzimm = 0 | NONE | loads the non-zero 6-bit immediate field into bits 17–12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. | Integer Computational Instructions | 
| C.ADDI | c.addi rd, nzimm[5:0] | x[rd] = x[rd] + sext(nzimm[5:0]) | rd = x0 & nzimm = 0 | NONE | adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd. | Integer Computational Instructions | 
| C.ADDI16SP | c.addi16sp nzimm[9:4] | x[2] = x[2] + sext(nzimm[9:4]) | rd != x2 & nzimm = 0 | NONE | adds the non-zero sign-extended 6-bit immediate to the value in the stack pointer (sp=x2), where the immediate is scaled to represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure prologues and epilogues. C.ADDI16SP shares the opcode with C.LUI, but has a destination field of x2. | Integer Computational Instructions | 
| C.ADDI4SPN | c.addi4spn rd', nzimm[9:2] | x[8 + rd'] = x[2] + zext(nzimm[9:2]) | nzimm = 0 | NONE | adds a zero-extended non-zero immediate, scaled by 4, to the stack pointer, x2, and writes the result to rd'. This instruction is used to generate pointers to stack-allocated variables. | Integer Computational Instructions | 
| C.SLLI | c.slli rd, uimm[5:0] | x[rd] = x[rd] << uimm[5:0] | rd = x0 & uimm[5] = 0 | NONE | performs a logical left shift (zeros are shifted into the lower bits). | Integer Computational Instructions | 
| C.SRLI | c.srli rd', uimm[5:0] | x[8 + rd'] = x[8 + rd'] >> uimm[5:0] | uimm[5] = 0 | NONE | performs a logical right shift (zeros are shifted into the upper bits). | Integer Computational Instructions | 
| C.SRAI | c.srai rd', uimm[5:0] | x[8 + rd'] = x[8 + rd'] >>s uimm[5:0] | uimm[5] = 0 | NONE | performs an arithmetic right shift (sign bits are shifted into the upper bits). | Integer Computational Instructions | 
| C.ANDI | c.andi rd', imm[5:0] | x[8 + rd'] = x[8 + rd'] & sext(imm[5:0]) | NONE | NONE | computes the bitwise AND of the value in register rd', and the sign-extended 6-bit immediate, then writes the result to rd'. | Integer Computational Instructions | 
| C.ADD | c.add rd, rs2 | x[rd] = x[rd] + x[rs2] | rd = x0 & rs2 = x0 | NONE | adds the values in registers rd and rs2 and writes the result to register rd. | Integer Computational Instructions | 
| C.MV | c.mv rd, rs2 | x[rd] = x[rs2] | rd = x0 & rs2 = x0 | NONE | copies the value in register rs2 into register rd. | Integer Computational Instructions | 
| C.AND | c.and rd', rs2' | x[8 + rd'] = x[8 + rd'] & x[8 + rs2'] | NONE | NONE | computes the bitwise AND of of the value in register rd', and register rs2', then writes the result to rd'. | Integer Computational Instructions | 
| C.OR | c.or rd', rs2' | x[8 + rd'] = x[8 + rd'] | x[8 + rs2'] | NONE | NONE | computes the bitwise OR of of the value in register rd', and register rs2', then writes the result to rd'. | Integer Computational Instructions | 
| C.XOR | c.and rd', rs2' | x[8 + rd'] = x[8 + rd'] ^ x[8 + rs2'] | NONE | NONE | computes the bitwise XOR of of the value in register rd', and register rs2', then writes the result to rd'. | Integer Computational Instructions | 
| C.SUB | c.sub rd', rs2' | x[8 + rd'] = x[8 + rd'] - x[8 + rs2'] | NONE | NONE | subtracts the value in registers rs2' from value in rd' and writes the result to register rd'. | Integer Computational Instructions | 
| C.EBREAK | c.ebreak | RaiseException(Breakpoint) | NONE | Raise a Breakpoint exception. | cause control to be transferred back to the debugging environment. | Integer Computational Instructions | 
| C.J | c.j imm[11:1] | pc += sext(imm[11:1]) | NONE | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | performs an unconditional control transfer. The offset is sign-extended and added to the pc to form the jump target address. | Control Transfer Instructions | 
| C.JAL | c.jal imm[11:1] | x[1] = pc+2; pc += sext(imm[11:1]) | NONE | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | performs the same operation as C.J, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. | Control Transfer Instructions | 
| C.JR | c.jr rs1 | pc = x[rs1] | rs1 = x0 | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | performs an unconditional control transfer to the address in register rs1. | Control Transfer Instructions | 
| C.JALR | c.jalr rs1 | t = pc+2; pc = x[rs1]; x[1] = t | rs1 = x0 | jumps to an unaligned address (4-byte or 2-byte boundary) will usually raise an exception. | performs the same operation as C.JR, but additionally writes the address of the instruction following the jump (pc+2) to the link register, x1. | Control Transfer Instructions | 
| C.BEQZ | c.beqz rs1', imm[8:1] | if (x[8+rs1'] == 0) pc += sext(imm[8:1]) | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1' is zero. | Control Transfer Instructions | 
| C.BNEZ | c.bnez rs1', imm[8:1] | if (x[8+rs1'] != 0) pc += sext(imm[8:1]) | NONE | no instruction fetch misaligned exception is generated for a conditional branch that is not taken. An Instruction address misaligned exception is raised if the target address is not aligned on 4-byte or 2-byte boundary, because the core supports compressed instructions. | performs conditional control transfers. The offset is sign-extended and added to the pc to form the branch target address. C.BEQZ takes the branch if the value in register rs1' isn’t zero. | Control Transfer Instructions | 
| C.LWSP | c.lwsp rd, uimm(x2) | x[rd] = M[x[2] + zext(uimm[7:2])][31:0] | rd = x0 | loads with a destination of x0 must still raise any exceptions, also an exception if the memory address isn’t aligned (4-byte boundary). | loads a 32-bit value from memory into register rd. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. | Load and Store Instructions | 
| C.SWSP | c.swsp rd, uimm(x2) | M[x[2] + zext(uimm[7:2])][31:0] = x[rs2] | NONE | an exception raised if the memory address isn’t aligned (4-byte boundary). | stores a 32-bit value in register rs2 to memory. It computes an effective address by adding the zero-extended offset, scaled by 4, to the stack pointer, x2. | Load and Store Instructions | 
| C.LW | c.lw rd', uimm(rs1') | x[8+rd'] = M[x[8+rs1'] + zext(uimm[6:2])][31:0]) | NONE | an exception raised if the memory address isn’t aligned (4-byte boundary). | loads a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. | Load and Store Instructions | 
| C.SW | c.sw rs2', uimm(rs1') | M[x[8+rs1'] + zext(uimm[6:2])][31:0] = x[8+rs2'] | NONE | an exception raised if the memory address isn’t aligned (4-byte boundary). | stores a 32-bit value from memory into register rd'. It computes an effective address by adding the zero-extended offset, scaled by 4, to the base address in register rs1'. | Load and Store Instructions | 
3.2.5. RV32Zicsr Control and Status Register Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| CSRRW | csrrw rd, csr, rs1 | t = CSRs[csr]; CSRs[csr] = x[rs1]; x[rd] = t | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read. | Control and Status Register Operations | 
| CSRRS | csrrs rd, csr, rs1 | t = CSRs[csr]; CSRs[csr] = t | x[rs1]; x[rd] = t | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs. | Control and Status Register Operations | 
| CSRRC | csrrc rd, csr, rs1 | t = CSRs[csr]; CSRs[csr] = t & ∼x[rs1]; x[rd] = t | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If rs1=x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs. | Control and Status Register Operations | 
| CSRRWI | csrrwi rd, csr, uimm[4:0] | x[rd] = CSRs[csr]; CSRs[csr] = zext(uimm[4:0]) | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the old value of the CSR, zero-extends the value to 32 bits, then writes it to integer register rd. The zero-extends immediate is written to the CSR. If rd=x0, then the instruction shall not read the CSR and shall not cause any of the side-effects that might occur on a CSR read. | Control and Status Register Operations | 
| CSRRSI | csrrsi rd, csr, uimm[4:0] | t = CSRs[csr]; CSRs[csr] = t | zext(uimm[4:0]); x[rd] = t | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write. | Control and Status Register Operations | 
| CSRRCI | csrrci rd, csr, uimm[4:0] | t = CSRs[csr]; CSRs[csr] = t & ∼zext(uimm[4:0]); x[rd] = t | NONE | Attempts to access a non-existent CSR raise an illegal instruction exception. Attempts to access a CSR without appropriate privilege level or to write a read-only register also raise illegal instruction exceptions. | Reads the value of the CSR, zero-extends the value to 32 bits, and writes it to integer register rd. The zero-extends immediate value is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in zero-extends immediate will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written). If the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write. | Control and Status Register Operations | 
3.2.6. RV32Zcb Code Size Reduction Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| C.ZEXT.B | c.zext.b rd' | x[8 + rd'] = zext(x[8 + rd'][7:0]) | NONE | NONE | This instruction takes a single source/destination operand. It zero-extends the least-significant byte of the operand by inserting zeros into all of the bits more significant than 7. | Code Size Reduction Operations | 
| C.SEXT.B | c.sext.b rd' | x[8 + rd'] = sext(x[8 + rd'][7:0]) | NONE | NONE | This instruction takes a single source/destination operand. It sign-extends the least-significant byte in the operand by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. It also requires Bit-Manipulation (Zbb) extension support. | Code Size Reduction Operations | 
| C.ZEXT.H | c.zext.h rd' | x[8 + rd'] = zext(x[8 + rd'][15:0]) | NONE | NONE | This instruction takes a single source/destination operand. It zero-extends the least-significant halfword of the operand by inserting zeros into all of the bits more significant than 15. It also requires Bit-Manipulation (Zbb) extension support. | Code Size Reduction Operations | 
| C.SEXT.H | c.sext.h rd' | x[8 + rd'] = sext(x[8 + rd'][15:0]) | NONE | NONE | This instruction takes a single source/destination operand. It sign-extends the least-significant halfword in the operand by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. It also requires Bit-Manipulation (Zbb) extension support. | Code Size Reduction Operations | 
| C.NOT | c.not rd' | x[8 + rd'] = x[8 + rd'] ^ -1 | NONE | NONE | This instruction takes the one’s complement of rd'/rs1' and writes the result to the same register. | Code Size Reduction Operations | 
| C.MUL | c.mul rd', rs2' | x[8 + rd'] = (x[8 + rd'] * x[8 + rs2'])[31:0] | NONE | NONE | performs a 32-bit × 32-bit multiplication and places the lower 32 bits in the destination register (Both rd' and rs2' treated as signed numbers). It also requires M extension support. | Code Size Reduction Operations | 
| C.LHU | c.lhu rd', uimm(rs1') | x[8+rd'] = zext(M[x[8+rs1'] + zext(uimm[1])][15:0]) | NONE | an exception raised if the memory address isn’t aligned (2-byte boundary). | This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is zero extended and is written to rd'. | Code Size Reduction Operations | 
| C.LH | c.lh rd', uimm(rs1') | x[8+rd'] = sext(M[x[8+rs1'] + zext(uimm[1])][15:0]) | NONE | an exception raised if the memory address isn’t aligned (2-byte boundary). | This instruction loads a halfword from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting halfword is sign extended and is written to rd'. | Code Size Reduction Operations | 
| C.LBU | c.lbu rd', uimm(rs1') | x[8+rd'] = zext(M[x[8+rs1'] + zext(uimm[1:0])][7:0]) | NONE | NONE | This instruction loads a byte from the memory address formed by adding rs1' to the zero extended immediate uimm. The resulting byte is zero extended and is written to rd'. | Code Size Reduction Operations | 
| C.SH | c.sh rs2', uimm(rs1') | M[x[8+rs1'] + zext(uimm[1])][15:0] = x[8+rs2'] | NONE | an exception raised if the memory address isn’t aligned (2-byte boundary). | This instruction stores the least significant halfword of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm. | Code Size Reduction Operations | 
| C.SB | c.sb rs2', uimm(rs1') | M[x[8+rs1'] + zext(uimm[1:0])][7:0] = x[8+rs2'] | NONE | NONE | This instruction stores the least significant byte of rs2' to the memory address formed by adding rs1' to the zero extended immediate uimm. | Code Size Reduction Operations | 
3.2.7. RVZba Address generation instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| ADD.UW | add.uw rd, rs1, rs2 | X(rd) = rs2 + EXTZ(X(rs1)[31..0]) | NONE | NONE | This instruction performs an XLEN-wide addition between rs2 and the zero-extended least-significant word of rs1. | Address generation instructions | 
| SH1ADD | sh1add rd, rs1, rs2 | X(rd) = X(rs2) + (X(rs1) << 1) | NONE | NONE | This instruction shifts rs1 to the left by 1 bit and adds it to rs2. | Address generation instructions | 
| SH1ADD.UW | sh1add.uw rd, rs1, rs2 | X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 1) | NONE | NONE | This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 1 place. | Address generation instructions | 
| SH2ADD | sh2add rd, rs1, rs2 | X(rd) = X(rs2) + (X(rs1) << 2) | NONE | NONE | This instruction shifts rs1 to the left by 2 bit and adds it to rs2. | Address generation instructions | 
| SH2ADD.UW | sh2add.uw rd, rs1, rs2 | X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 2) | NONE | NONE | This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 2 places. | Address generation instructions | 
| SH3ADD | sh3add rd, rs1, rs2 | X(rd) = X(rs2) + (X(rs1) << 3) | NONE | NONE | This instruction shifts rs1 to the left by 3 bit and adds it to rs2. | Address generation instructions | 
| SH3ADD.UW | sh3add.uw rd, rs1, rs2 | X(rd) = rs2 + (EXTZ(X(rs1)[31..0]) << 3) | NONE | NONE | This instruction performs an XLEN-wide addition of two addends. The first addend is rs2. The second addend is the unsigned value formed by extracting the least-significant word of rs1 and shifting it left by 3 places. | Address generation instructions | 
| SLLI.UW | slli.uw rd, rs1, imm | X(rd) = (EXTZ(X(rs)[31..0]) << imm) | NONE | NONE | This instruction takes the least-significant word of rs1, zero-extends it, and shifts it left by the immediate. | Address generation instructions | 
3.2.8. RVZbb Basic bit-manipulation
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| ANDN | andn rd, rs1, rs2 | X(rd) = X(rs1) & ~X(rs2) | NONE | NONE | Performs bitwise AND operation between rs1 and bitwise inversion of rs2. | Logical_with_negate | 
| ORN | orn rd, rs1, rs2 | X(rd) = X(rs1) | ~X(rs2) | NONE | NONE | Performs bitwise OR operation between rs1 and bitwise inversion of rs2. | Logical_with_negate | 
| XNOR | xnor rd, rs1, rs2 | X(rd) = ~(X(rs1) ^ X(rs2)) | NONE | NONE | Performs bitwise XOR operation between rs1 and rs2, then complements the result. | Logical_with_negate | 
| CLZ | clz rd, rs | if [x[i]] == 1 then return(i) else return -1 | NONE | NONE | Counts leading zero bits in rs. | Count_leading_trailing_zero_bits | 
| CTZ | ctz rd, rs | if [x[i]] == 1 then return(i) else return xlen; | NONE | NONE | Counts trailing zero bits in rs. | Count_leading_trailing_zero_bits | 
| CLZW | clzw rd, rs | if [x[i]] == 1 then return(i) else return -1 | NONE | NONE | Counts leading zero bits in the least-significant word of rs. | Count_leading_trailing_zero_bits | 
| CTZW | ctzw rd, rs | if [x[i]] == 1 then return(i) else return 32; | NONE | NONE | Counts trailing zero bits in the least-significant word of rs. | Count_leading_trailing_zero_bits | 
| CPOP | cpop rd, rs | if rs[i] == 1 then bitcount = bitcount + 1 else () | NONE | NONE | Counts set bits in rs. | Count_population | 
| CPOPW | cpopw rd, rs | if rs[i] == 0b1 then bitcount = bitcount + 1 else () | NONE | NONE | Counts set bits in the least-significant word of rs. | Count_population | 
| MAX | max rd, rs1, rs2 | if rs1_val <_s rs2_val then rs2_val else rs1_val | NONE | NONE | Returns the larger of two signed integers. | Integer_minimum_maximum | 
| MAXU | maxu rd, rs1, rs2 | if rs1_val <_u rs2_val then rs2_val else rs1_val | NONE | NONE | Returns the larger of two unsigned integers. | Integer_minimum_maximum | 
| MIN | min rd, rs1, rs2 | if rs1_val <_s rs2_val then rs1_val else rs2_val | NONE | NONE | Returns the smaller of two signed integers. | Integer_minimum_maximum | 
| MINU | minu rd, rs1, rs2 | if rs1_val <_u rs2_val then rs1_val else rs2_val | NONE | NONE | Returns the smaller of two unsigned integers. | Integer_minimum_maximum | 
| SEXT.B | sext.b rd, rs | X(rd) = EXTS(X(rs)[7..0]) | NONE | NONE | Sign-extends the least-significant byte in the source to XLEN. | Sign_and_zero_extension | 
| SEXT.H | sext.h rd, rs | X(rd) = EXTS(X(rs)[15..0]) | NONE | NONE | Sign-extends the least-significant halfword in rs to XLEN. | Sign_and_zero_extension | 
| ZEXT.H | zext.h rd, rs | X(rd) = EXTZ(X(rs)[15..0]) | NONE | NONE | Zero-extends the least-significant halfword of the source to XLEN. | Sign_and_zero_extension | 
| ROL | rol rd, rs1, rs2 | (X(rs1) << log2(XLEN)) | (X(rs1) >> (xlen - log2(XLEN))) | NONE | NONE | Performs a rotate left of rs1 by the amount in least-significant log2(XLEN) bits of rs2. | Bitwise_rotation | 
| ROR | ror rd, rs1, rs2 | (X(rs1) >> log2(XLEN)) | (X(rs1) << (xlen - log2(XLEN))) | NONE | NONE | Performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of rs2. | Bitwise_rotation | 
| RORI | rori rd, rs1, shamt | (X(rs1) >> log2(XLEN)) | (X(rs1) << (xlen - log2(XLEN))) | NONE | NONE | Performs a rotate right of rs1 by the amount in least-significant log2(XLEN) bits of shamt. | Bitwise_rotation | 
| ROLW | rolw rd, rs1, rs2 | EXTSrs1 << X(rs2)[4..0]) | (rs1) | NONE | NONE | Performs a rotate left on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. | Bitwise_rotation | 
| RORIW | roriw rd, rs1, shamt | (rs1_data >> shamt[4..0]) | (rs1_data << (32 - shamt[4..0])) | NONE | NONE | Performs a rotate right on the least-significant word of rs1 by the amount in least-significant log2(XLEN) bits of shamt. | Bitwise_rotation | 
| RORW | rorw rd, rs1, rs2 | (rs1 >> X(rs2)[4..0]) | (rs1 << (32 - X(rs2)[4..0])) | NONE | NONE | Performs a rotate right on the least-significant word of rs1 by the amount in least-significant 5 bits of rs2. | Bitwise_rotation | 
| ORC.b | orc.b rd, rs | if { input[(i + 7)..i] == 0 then 0b00000000 else 0b11111111 | NONE | NONE | Sets the bits of each byte in rd to all zeros if no bit within the respective byte of rs is set, or to all ones if any bit within the respective byte of rs is set. | OR_Combine | 
| REV8 | rev8 rd, rs | output[i..(i + 7)] = input[(j - 7)..j] | NONE | NONE | Reverses the order of the bytes in rs. | Byte_reverse | 
3.2.9. RVZbc Carry-less multiplication
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| CLMUL | clmul rd, rs1, rs2 | foreach (i from 1 to xlen by 1) { output = if ((rs2 >> i) & 1) then output ^ (rs1 << i); else output; } | NONE | NONE | clmul produces the lower half of the 2.XLEN carry-less product. | Carry-less multiplication Operations | 
| CLMULH | clmulh rd, rs1, rs2 | foreach (i from 1 to xlen by 1) { output = if rs2_val else output } | NONE | NONE | clmulh produces the upper half of the 2.XLEN carry-less product. | Carry-less multiplication Operations | 
| CLMULR | clmulr rd, rs1, rs2 | foreach (i from 0 to (xlen - 1) by 1) { output = if rs2_val else output } | NONE | NONE | clmulr produces bits 2.XLEN-2:XLEN-1 of the 2.XLEN carry-less product. | Carry-less multiplication Operations | 
3.2.10. RVZbs Single bit Instructions
| Name | Format | Pseudocode | Invalid_values | Instruction_exception | Instruction_description | Op Name | 
|---|---|---|---|---|---|---|
| BCLR | bclr rd, rs1, rs2 | X(rd) = X(rs1) & ~(1 << (X(rs2) & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit cleared at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2. | Single_bit_Operations | 
| BCLRI | bclri rd, rs1, shamt | X(rd) = X(rs1) & ~(1 << (shamt & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit cleared at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved. | Single_bit_Operations | 
| BEXT | bext rd, rs1, rs2 | X(rd) = (X(rs1) >> (X(rs2) & (XLEN - 1))) & 1 | NONE | NONE | This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2. | Single_bit_Operations | 
| BEXTI | bexti rd, rs1, shamt | X(rd) = (X(rs1) >> (shamt & (XLEN - 1))) & 1 | NONE | NONE | This instruction returns a single bit extracted from rs1 at the index specified in rs2. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved. | Single_bit_Operations | 
| BINV | binv rd, rs1, rs2 | X(rd) = X(rs1) ^ (1 << (X(rs2) & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit inverted at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2. | Single_bit_Operations | 
| BINVI | binvi rd, rs1, shamt | X(rd) = X(rs1) ^ (1 << (shamt & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit inverted at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved. | Single_bit_Operations | 
| BSET | bset rd, rs1, rs2 | X(rd) = X(rs1) | (1 << (X(rs2) & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit set at the index specified in rs2. The index is read from the lower log2(XLEN) bits of rs2. | Single_bit_Operations | 
| BSETI | bseti rd, rs1, shamt | X(rd) = X(rs1) | (1 << (shamt & (XLEN - 1))) | NONE | NONE | This instruction returns rs1 with a single bit set at the index specified in shamt. The index is read from the lower log2(XLEN) bits of shamt. For RV32, the encodings corresponding to shamt[5]=1 are reserved. | Single_bit_Operations | 
3.3. Traps, Interrupts, Exceptions
Traps are composed of interrupts and exceptions. Interrupts are asynchronous events whereas exceptions are synchronous ones. On one hand, interrupts are occuring independently of the instructions (mainly raised by peripherals ). On the other hand, an instruction may raise exceptions synchronously.
3.3.1. Raising Traps
When a trap is raised, the behaviour of the CVA6 core depends on several CSRs and some CSRs are modified.
3.3.1.1. Configuration CSRs
CSRs having an effect on the core behaviour when a trap occurs are:
- 
mstatus: several fields control the core behaviour like interrupt enable (MIE)
- 
mtvec: specifies the address of trap handler.
3.3.1.2. Modified CSRs
CSRs (or fields) updated by the core when a trap occurs are:
- 
mstatus: several fields are updated like previous privilege mode (MPP), previous interrupt enabled (MPIE)
- 
mepc: updated with the address of the interrupted instruction or the instruction raising the exception.
- 
mcause: updated with a code indicating the event causing the trap.
3.3.1.3. Supported exceptions
The following exceptions are supported by the CV32A60X:
- 
instruction address misaligned - 
control flow instruction with misaligned target 
 
- 
- 
illegal instruction: - 
unimplemented CSRs 
- 
unsupported extensions 
 
- 
- 
breakpoint ( EBREAK)
- 
load address misaligned: - 
LHat 2n+1 address
- 
LWat 4n+1, 4n+2, 4n+3 address
 
- 
- 
store/AMO address misaligned - 
SHat 2n+1 address
- 
SWat 4n+1, 4n+2, 4n+3 address
 
- 
- 
environment call ( ECALL) from M-mode
3.3.2. Trap return
Trap handler ends with trap return instruction (MRET
). The behaviour of the CV32A60X core depends on several CSRs.
3.3.2.1. Configuration CSRs
CSRs having an effect on the core behaviour when returning from a trap are:
- 
mstatus: several fields control the core behaviour like previous privilege mode (MPP), previous interrupt enabled (MPIE)
3.3.2.2. Modified CSRs
CSRs (or fields) updated by the core when returning from a trap are:
- 
mstatus: several fields are updated like interrupt enable (MIE), modify privilege (MPRV)
3.3.3. Interrupts
- 
external interrupt: irq_isignal
- 
timer interrupt: time_irq_isignal
These signals are level sensitive. It means the interrupt is raised until it is cleared.
The exception code field (mcause CSR) depends on the interrupt source.
3.3.4. Wait for Interrupt
- 
CVA6 implementation: WFIstalls the core until an interrupt occurs.
3.4. csr
3.4.1. Conventions
In the subsequent sections, register fields are labeled with one of the following abbreviations:
- 
WPRI (Writes Preserve Values, Reads Ignore Values): read/write field reserved for future use. For forward compatibility, implementations that do not furnish these fields must make them read-only zero. 
- 
WLRL (Write/Read Only Legal Values): read/write CSR field that specifies behavior for only a subset of possible bit encodings, with other bit encodings reserved. 
- 
WARL (Write Any Values, Reads Legal Values): read/write CSR fields which are only defined for a subset of bit encodings, but allow any value to be written while guaranteeing to return a legal value whenever read. 
- 
ROCST (Read-Only Constant): A special case of WARL field which admits only one legal value, and therefore, behaves as a constant field that silently ignores writes. 
- 
ROVAR (Read-Only Variable): A special case of WARL field which can take multiple legal values but cannot be modified by software and depends only on the architectural state of the hart. 
In particular, a register that is not internally divided into multiple fields can be considered as containing a single field of XLEN bits. This allows to clearly represent read-write registers holding a single legal value (typically zero).
3.4.2. Register Summary
| Address | Register Name | Privilege | Description | 
|---|---|---|---|
| 0x300 | MRW | The mstatus register keeps track of and controls the hart’s current operating state. | |
| 0x301 | MRW | misa is a read-write register reporting the ISA supported by the hart. | |
| 0x304 | MRW | The mie register is an MXLEN-bit read/write register containing interrupt enable bits. | |
| 0x305 | MRW | MXLEN-bit read/write register that holds trap vector configuration. | |
| 0x310 | MRW | The mstatush register keeps track of and controls the hart’s current operating state. | |
| 0x323-0x33f | MRW | The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3. | |
| 0x340 | MRW | The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode. | |
| 0x341 | MRW | The mepc is a warl register that must be able to hold all valid physical and virtual addresses. | |
| 0x342 | MRW | The mcause register stores the information regarding the trap. | |
| 0x343 | MRW | The mtval is a warl register that holds the address of the instruction which caused the exception. | |
| 0x344 | MRW | The mip register is an MXLEN-bit read/write register containing information on pending interrupts. | |
| 0x3a0-0x3af | MRW | PMP configuration register | |
| 0x3b0-0x3ef | MRW | Physical memory protection address register | |
| 0x7c0 | MRW | the register controls the operation of the i-cache unit. | |
| 0x7c1 | MRW | the register controls the operation of the d-cache unit. | |
| 0xb00 | MRW | Counts the number of clock cycles executed from an arbitrary point in time. | |
| 0xb02 | MRW | Counts the number of instructions completed from an arbitrary point in time. | |
| 0xb03-0xb1f | MRW | The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode. | |
| 0xb80 | MRW | upper 32 bits of mcycle | |
| 0xb82 | MRW | Upper 32 bits of minstret. | |
| 0xb83-0xb9f | MRW | The mhpmcounterh returns the upper half word in RV32I systems. | |
| 0xf11 | MRO | 32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core. | |
| 0xf12 | MRO | MXLEN-bit read-only register encoding the base microarchitecture of the hart. | |
| 0xf13 | MRO | Provides a unique encoding of the version of the processor implementation. | |
| 0xf14 | MRO | MXLEN-bit read-only register containing the integer ID of the hardware thread running the code. | |
| 0xf15 | MRO | MXLEN-bit read-only register that holds the physical address of a configuration data structure. | 
3.4.3. Register Description
3.4.3.1. MSTATUS
- Address
- 
0x300 
- Reset Value
- 
0x00001800 
- Privilege
- 
MRW 
- Description
- 
The mstatus register keeps track of and controls the hart’s current operating state. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| 0 | UIE | 0x0 | ROCST | 0x0 | Stores the state of the user mode interrupts. | 
| 1 | SIE | 0x0 | ROCST | 0x0 | Stores the state of the supervisor mode interrupts. | 
| 2 | RESERVED_2 | 0x0 | WPRI | Reserved | |
| 3 | MIE | 0x0 | WLRL | 0x0 - 0x1 | Stores the state of the machine mode interrupts. | 
| 4 | UPIE | 0x0 | ROCST | 0x0 | Stores the state of the user mode interrupts prior to the trap. | 
| 5 | SPIE | 0x0 | ROCST | 0x0 | Stores the state of the supervisor mode interrupts prior to the trap. | 
| 6 | UBE | 0x0 | ROCST | 0x0 | control the endianness of memory accesses other than instruction fetches for user mode | 
| 7 | MPIE | 0x0 | WLRL | 0x0 - 0x1 | Stores the state of the machine mode interrupts prior to the trap. | 
| 8 | SPP | 0x0 | ROCST | 0x0 | Stores the previous priority mode for supervisor. | 
| [10:9] | RESERVED_9 | 0x0 | WPRI | Reserved | |
| [12:11] | MPP | 0x3 | WARL | 0x3 | Stores the previous priority mode for machine. | 
| [14:13] | FS | 0x0 | ROCST | 0x0 | Encodes the status of the floating-point unit, including the CSR fcsr and floating-point data registers. | 
| [16:15] | XS | 0x0 | ROCST | 0x0 | Encodes the status of additional user-mode extensions and associated state. | 
| 17 | MPRV | 0x0 | ROCST | 0x0 | Modifies the privilege level at which loads and stores execute in all privilege modes. | 
| 18 | SUM | 0x0 | ROCST | 0x0 | Modifies the privilege with which S-mode loads and stores access virtual memory. | 
| 19 | MXR | 0x0 | ROCST | 0x0 | Modifies the privilege with which loads access virtual memory. | 
| 20 | TVM | 0x0 | ROCST | 0x0 | Supports intercepting supervisor virtual-memory management operations. | 
| 21 | TW | 0x0 | ROCST | 0x0 | Supports intercepting the WFI instruction. | 
| 22 | TSR | 0x0 | ROCST | 0x0 | Supports intercepting the supervisor exception return instruction. | 
| 23 | SPELP | 0x0 | ROCST | 0x0 | Supervisor mode previous expected-landing-pad (ELP) state. | 
| [30:24] | RESERVED_24 | 0x0 | WPRI | Reserved | |
| 31 | SD | 0x0 | ROCST | 0x0 | Read-only bit that summarizes whether either the FS field or XS field signals the presence of some dirty state. | 
3.4.3.2. MISA
- Address
- 
0x301 
- Reset Value
- 
0x40001106 
- Privilege
- 
MRW 
- Description
- 
misa is a read-write register reporting the ISA supported by the hart. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [25:0] | EXTENSIONS | 0x1106 | ROCST | 0x1106 | Encodes the presence of the standard extensions, with a single bit per letter of the alphabet. | 
| [29:26] | RESERVED_26 | 0x0 | WPRI | Reserved | |
| [31:30] | MXL | 0x1 | WARL | 0x1 | Encodes the native base integer ISA width. | 
3.4.3.3. MIE
- Address
- 
0x304 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mie register is an MXLEN-bit read/write register containing interrupt enable bits. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| 0 | USIE | 0x0 | ROCST | 0x0 | User Software Interrupt enable. | 
| 1 | SSIE | 0x0 | ROCST | 0x0 | Supervisor Software Interrupt enable. | 
| 2 | VSSIE | 0x0 | ROCST | 0x0 | VS-level Software Interrupt enable. | 
| 3 | MSIE | 0x0 | ROCST | 0x0 | Machine Software Interrupt enable. | 
| 4 | UTIE | 0x0 | ROCST | 0x0 | User Timer Interrupt enable. | 
| 5 | STIE | 0x0 | ROCST | 0x0 | Supervisor Timer Interrupt enable. | 
| 6 | VSTIE | 0x0 | ROCST | 0x0 | VS-level Timer Interrupt enable. | 
| 7 | MTIE | 0x0 | WLRL | 0x0 - 0x1 | Machine Timer Interrupt enable. | 
| 8 | UEIE | 0x0 | ROCST | 0x0 | User External Interrupt enable. | 
| 9 | SEIE | 0x0 | ROCST | 0x0 | Supervisor External Interrupt enable. | 
| 10 | VSEIE | 0x0 | ROCST | 0x0 | VS-level External Interrupt enable. | 
| 11 | MEIE | 0x0 | WLRL | 0x0 - 0x1 | Machine External Interrupt enable. | 
| 12 | SGEIE | 0x0 | ROCST | 0x0 | HS-level External Interrupt enable. | 
| [31:13] | RESERVED_13 | 0x0 | WPRI | Reserved | 
3.4.3.4. MTVEC
- Address
- 
0x305 
- Reset Value
- 
0x80010000 
- Privilege
- 
MRW 
- Description
- 
MXLEN-bit read/write register that holds trap vector configuration. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [1:0] | MODE | 0x0 | WARL | 0x0 | Vector mode. | 
| [31:2] | BASE | 0x20004000 | WARL | 0x00000000 - 0x3FFFFFFF | Vector base address. | 
3.4.3.5. MSTATUSH
- Address
- 
0x310 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mstatush register keeps track of and controls the hart’s current operating state. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [3:0] | RESERVED_0 | 0x0 | WPRI | Reserved | |
| 4 | SBE | 0x0 | ROCST | 0x0 | control the endianness of memory accesses other than instruction fetches for supervisor mode | 
| 5 | MBE | 0x0 | ROCST | 0x0 | control the endianness of memory accesses other than instruction fetches for machine mode | 
| 6 | GVA | 0x0 | ROCST | 0x0 | Stores the state of the supervisor mode interrupts. | 
| 7 | MPV | 0x0 | ROCST | 0x0 | Stores the state of the user mode interrupts. | 
| 8 | RESERVED_8 | 0x0 | WPRI | Reserved | |
| 9 | MPELP | 0x0 | ROCST | 0x0 | Machine mode previous expected-landing-pad (ELP) state. | 
| [31:10] | RESERVED_10 | 0x0 | WPRI | Reserved | 
3.4.3.6. MHPMEVENT[3-31]
- Address
- 
0x323-0x33f 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MHPMEVENT[I] | 0x00000000 | ROCST | 0x0 | The mhpmevent is a MXLEN-bit event register which controls mhpmcounter3. | 
3.4.3.7. MSCRATCH
- Address
- 
0x340 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MSCRATCH | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | The mscratch register is an MXLEN-bit read/write register dedicated for use by machine mode. | 
3.4.3.8. MEPC
- Address
- 
0x341 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mepc is a warl register that must be able to hold all valid physical and virtual addresses. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MEPC | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | The mepc is a warl register that must be able to hold all valid physical and virtual addresses. | 
3.4.3.9. MCAUSE
- Address
- 
0x342 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mcause register stores the information regarding the trap. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [30:0] | EXCEPTION_CODE | 0x0 | WLRL | 0x0 - 0x8, 0xb | Encodes the exception code. | 
| 31 | INTERRUPT | 0x0 | WLRL | 0x0 - 0x1 | Indicates whether the trap was due to an interrupt. | 
3.4.3.10. MTVAL
- Address
- 
0x343 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mtval is a warl register that holds the address of the instruction which caused the exception. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MTVAL | 0x00000000 | ROCST | 0x0 | The mtval is a warl register that holds the address of the instruction which caused the exception. | 
3.4.3.11. MIP
- Address
- 
0x344 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mip register is an MXLEN-bit read/write register containing information on pending interrupts. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| 0 | USIP | 0x0 | ROCST | 0x0 | User Software Interrupt Pending. | 
| 1 | SSIP | 0x0 | ROCST | 0x0 | Supervisor Software Interrupt Pending. | 
| 2 | VSSIP | 0x0 | ROCST | 0x0 | VS-level Software Interrupt Pending. | 
| 3 | MSIP | 0x0 | ROCST | 0x0 | Machine Software Interrupt Pending. | 
| 4 | UTIP | 0x0 | ROCST | 0x0 | User Timer Interrupt Pending. | 
| 5 | STIP | 0x0 | ROCST | 0x0 | Supervisor Timer Interrupt Pending. | 
| 6 | VSTIP | 0x0 | ROCST | 0x0 | VS-level Timer Interrupt Pending. | 
| 7 | MTIP | 0x0 | ROVAR | 0x0 - 0x1 | Machine Timer Interrupt Pending. | 
| 8 | UEIP | 0x0 | ROCST | 0x0 | User External Interrupt Pending. | 
| 9 | SEIP | 0x0 | ROCST | 0x0 | Supervisor External Interrupt Pending. | 
| 10 | VSEIP | 0x0 | ROCST | 0x0 | VS-level External Interrupt Pending. | 
| 11 | MEIP | 0x0 | ROVAR | 0x0 - 0x1 | Machine External Interrupt Pending. | 
| 12 | SGEIP | 0x0 | ROCST | 0x0 | HS-level External Interrupt Pending. | 
| [31:13] | RESERVED_13 | 0x0 | WPRI | Reserved | 
3.4.3.12. PMPCFG[0-15]
- Address
- 
0x3a0-0x3af 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
PMP configuration register 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [7:0] | PMP[I*4 +0]CFG | 0x0 | ROCST | 0x0 | pmp configuration bits | 
| [15:8] | PMP[I*4 +1]CFG | 0x0 | ROCST | 0x0 | pmp configuration bits | 
| [23:16] | PMP[I*4 +2]CFG | 0x0 | ROCST | 0x0 | pmp configuration bits | 
| [31:24] | PMP[I*4 +3]CFG | 0x0 | ROCST | 0x0 | pmp configuration bits | 
3.4.3.13. PMPADDR[0-63]
- Address
- 
0x3b0-0x3ef 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
Physical memory protection address register 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | PMPADDR[I] | 0x00000000 | ROCST | 0x0 | Physical memory protection address register | 
3.4.3.14. ICACHE
- Address
- 
0x7c0 
- Reset Value
- 
0x00000001 
- Privilege
- 
MRW 
- Description
- 
the register controls the operation of the i-cache unit. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| 0 | ICACHE | 0x1 | RW | 0x1 | bit for cache-enable of instruction cache | 
| [31:1] | RESERVED_1 | 0x0 | WPRI | Reserved | 
3.4.3.15. DCACHE
- Address
- 
0x7c1 
- Reset Value
- 
0x00000001 
- Privilege
- 
MRW 
- Description
- 
the register controls the operation of the d-cache unit. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| 0 | DCACHE | 0x1 | RW | 0x1 | bit for cache-enable of data cache | 
| [31:1] | RESERVED_1 | 0x0 | WPRI | Reserved | 
3.4.3.16. MCYCLE
- Address
- 
0xb00 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
Counts the number of clock cycles executed from an arbitrary point in time. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MCYCLE | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | Counts the number of clock cycles executed from an arbitrary point in time. | 
3.4.3.17. MINSTRET
- Address
- 
0xb02 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
Counts the number of instructions completed from an arbitrary point in time. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MINSTRET | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | Counts the number of instructions completed from an arbitrary point in time. | 
3.4.3.18. MHPMCOUNTER[3-31]
- Address
- 
0xb03-0xb1f 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MHPMCOUNTER[I] | 0x00000000 | ROCST | 0x0 | The mhpmcounter is a 64-bit counter. Returns lower 32 bits in RV32I mode. | 
3.4.3.19. MCYCLEH
- Address
- 
0xb80 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
upper 32 bits of mcycle 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MCYCLEH | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | upper 32 bits of mcycle | 
3.4.3.20. MINSTRETH
- Address
- 
0xb82 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
Upper 32 bits of minstret. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MINSTRETH | 0x00000000 | WARL | 0x00000000 - 0xFFFFFFFF | Upper 32 bits of minstret. | 
3.4.3.21. MHPMCOUNTER[3-31]H
- Address
- 
0xb83-0xb9f 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRW 
- Description
- 
The mhpmcounterh returns the upper half word in RV32I systems. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MHPMCOUNTER[I]H | 0x00000000 | ROCST | 0x0 | The mhpmcounterh returns the upper half word in RV32I systems. | 
3.4.3.22. MVENDORID
- Address
- 
0xf11 
- Reset Value
- 
0x00000602 
- Privilege
- 
MRO 
- Description
- 
32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MVENDORID | 0x00000602 | ROCST | 0x602 | 32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core. | 
3.4.3.23. MARCHID
- Address
- 
0xf12 
- Reset Value
- 
0x00000003 
- Privilege
- 
MRO 
- Description
- 
MXLEN-bit read-only register encoding the base microarchitecture of the hart. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MARCHID | 0x00000003 | ROCST | 0x3 | MXLEN-bit read-only register encoding the base microarchitecture of the hart. | 
3.4.3.24. MIMPID
- Address
- 
0xf13 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRO 
- Description
- 
Provides a unique encoding of the version of the processor implementation. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MIMPID | 0x00000000 | ROCST | 0x0 | Provides a unique encoding of the version of the processor implementation. | 
3.4.3.25. MHARTID
- Address
- 
0xf14 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRO 
- Description
- 
MXLEN-bit read-only register containing the integer ID of the hardware thread running the code. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MHARTID | 0x00000000 | ROCST | 0x0 | MXLEN-bit read-only register containing the integer ID of the hardware thread running the code. | 
3.4.3.26. MCONFIGPTR
- Address
- 
0xf15 
- Reset Value
- 
0x00000000 
- Privilege
- 
MRO 
- Description
- 
MXLEN-bit read-only register that holds the physical address of a configuration data structure. 
| Bits | Field Name | Reset Value | Type | Legal Values | Description | 
|---|---|---|---|---|---|
| [31:0] | MCONFIGPTR | 0x00000000 | ROCST | 0x0 | MXLEN-bit read-only register that holds the physical address of a configuration data structure. | 
3.5. OBI Bus interface
OBI Bus aims to connected CVA6 to the memory.
CVA6 implements 3 interfaces, one for fetch instruction and two interfaces for data handling, one dedicated to loading operations and the other to storing operations.
Each connection uses a bus interface that complies with the OBI 1.6 standard. A detailed explanation of the optional fields described by this standard is provided in the table below.
In order to understand how the OBI memory interface behaves in CVA6, it is necessary to read the OBI Protocol Specification (https://github.com/openhwgroup/obi/) and this chapter.
3.5.1. Protocol Description
OBI is a point-to-point protocol (between manager and subordinate)
An OBI link consists of two channels:
- 
The address channel, called the A channel. 
- 
The response channel, called the R channel 
OBI transaction consists of two transfers:
- 
An address phase transfer over the A channel. 
- 
A response phase transfer over the R channel. 
The address phase transfer is as follows:
- 
The manager indicates the validity of its address phase signals (i.e. addr, wdata, we, be, auser, aid, mid, wuser, prot, memtype, dbg, atop, achk) by settings its request (req) high. 
- 
The subordinate indicates its readiness to accept the address phase signals by setting grant (gnt) high. 
- 
The address phase of a transaction starts in the cycle in which req goes high and completes on the rising clk edge when both req and gnt are high. 
The response phase transfer is as follows:
- 
After a granted request (in the A channel), the subordinate indicates the validity of its response phase signals (i.e. rdata, err, ruser, rid, exokay, rchk) by setting rvalid high. 
- 
The manager indicates its readiness to accept the response phase signals by setting rready high. 
- 
The response phase of a transaction starts in the cycle in which rvalid goes high and completes on the rising clk edge when both rvalid and rready are high. 
3.5.2. CVA6 implementation
All OBI buses for FETCH, LOAD, and STORE operations have a data size of 32 bits.
The Fetch interface performs only full BE accesses.
The Fetch and Load interfaces are only capable of read accesses.
The Store interface is exclusively for write accesses.
CVA6 is always ready to accept a response on channel R, so the rrready signal is always high.
The Fetch and Load interfaces do not generate outstanding accesses.
The Store interface does not use channel R. Therefore, the depth of outstanding accesses depends on the subordinate.
The error flag is not used at all.
Parity signal support is partial. Parity signals reqpar and rreadypar are generated as specified by the standard, but gntpar and rvalidpar are currently not checked by CVA6.
Checksum signals achk and rchk are not supported.
All user data signals (auser, wuser, ruser) are not used.
CVA6 does not support atomic operations, so exokay and atom are not used.
CVA6 does not have a Physical Memory Attribution (PMA), so cacheable or bufferable regions are not defined. As a consequence, memtype always 0.
prot[0] is driven to indicate the type of access: FETCH for the Fetch interface and DATA for the LOAD/STORE interfaces.
As only Machine mode is available in CVA6, prot[2:1] is statically at b'11' (Machine Mode).
The master identifier "Mid" is set to 0.
The transaction identifiers aid and rid signals are only 1-bit size. Even though there are no outstanding reads available, rid must be driven by the subordinate in compliance with the standard (rid = aid).
3.5.3. Port list channel A
| Fetch | Load | Store | |||
|---|---|---|---|---|---|
| req | OUTPUT | used | used | used | Address transfer request. req=1 signals the availability of valid address phase signals. | 
| gnt | INPUT | used | used | used | Grant. Ready to accept address transfer. Address transfer is accepted on rising clk with req=1 and gnt=1. | 
| addr[] | OUTPUT | used | used | used | Address | 
| we | OUTPUT | used | used | used | Write Enable, high for writes, low for reads. | 
| be[3:0] | OUTPUT | used (always Full) | used | used | Byte Enable. Is set for the bytes to write/read. | 
| wdata[31:0] | OUTPUT | 0 (default) | 0 (default) | used | Write data. Only valid for write transactions. Undefined for read transactions. | 
| auser[0:0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Address Phase User signals. Valid for both read and write transactions. | 
| wuser[0:0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Additional Address Phase User signals. Only valid for write transactions. Undefined for read transactions. | 
| aid[0:0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Address Phase transaction identifier. | 
| atop[5:0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Atomic Operation. | 
| memtype[0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Memory type attributes | 
| memtype[1] | OUTPUT | 0, no PMA | 0, no PMA | 0, no PMA | Memory type attributes, cacheable flag. | 
| mid[0:0] | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Manager ID. | 
| prot[0] | OUTPUT | FETCH only | DATA only | DATA only | Protection attributes. Kind of access : 0 is FETCH, 1 is DATA | 
| prot[2:1] | OUTPUT | 2’b11 | 2’b11 | 2’b11 | Protection attributes, Privilege Mode: 2’b11 is machine mode. | 
| dbg | OUTPUT | 0 (default) | 0 (default) | 0 (default) | Debug access. | 
| reqpar | OUTPUT | used | used | used | Parity bit for req signal (odd parity). | 
| gntpar | INPUT | not used | not used | not used | Parity bit for gnt signal (odd parity). | 
| achk[0:0] | OUTPUT | tied at zero | tied at zero | tied at zero | Checksum for address phase signals (except achk itself). | 
3.5.4. Port list channel R
| Fetch | Load | Store | |||
|---|---|---|---|---|---|
| rvalid | INPUT | used | used | used | Response transfer request. rvalid=1 signals the availability of valid response phase signals. Used for both reads and writes. | 
| rready | OUTPUT | used | used | used | Ready to accept response transfer. Response transfer is accepted on rising clk with rvalid=1 and rready=1. | 
| rdata[31:0] | INPUT | used | used | not used | Read data. Only valid for read transactions. Undefined for write transactions. | 
| err | INPUT | not used | not used | not used | Error. | 
| ruser[0:0] | INPUT | not used | not used | not used | Response phase User signals. Only valid for read transactions. Undefined for write transactions. | 
| rid[0:0] | INPUT | used | used | used | Response Phase transaction identifier. | 
| exokay | INPUT | not used | not used | not used | Exclusive transaction okay. | 
| rvalidpar | INPUT | not used | not used | not used | Parity bit for rvalid signal (odd parity). | 
| rreadypar | OUTPUT | used | used | used | Parity bit for rready signal (odd parity). | 
| rchk[0:0] | INPUT | not used | not used | not used | Checksum for address phase signals (except rchk itself). | 
3.6. CV-X-IF Interface and Coprocessor
The CV-X-IF interface of CVA6 allows to extend its supported instruction set with external coprocessors.
3.6.1. CV-X-IF interface specification
3.6.1.1. Description
This design specification presents global functionalities of Core-V-eXtension-Interface (XIF, CVXIF, CV-X-IF, X-interface) in the CVA6 core. CVA6 implements version v1.0.0: Ratified Release.
The CORE-V X-Interface is a RISC-V eXtension interface that provides a
generalized framework suitable to implement custom coprocessors and ISA
extensions for existing RISC-V processors.
Refer to https://github.com/openhwgroup/core-v-xif GitHub repositoryThe specification of the CV-X-IF bus protocol can be found at [CV-X-IF].
CV-X-IF aims to:
- 
Create interfaces to connect a coprocessor to the CVA6 to execute instructions. 
- 
Offload CVA6 illegal instrutions to the coprocessor to be executed. 
- 
Get the results of offloaded instructions from the coprocessor so they are written back into the CVA6 register file. 
- 
Add standard RISC-V instructions unsupported by CVA6 or custom instructions and implement them in a coprocessor. 
- 
Kill offloaded instructions to allow speculative execution in the coprocessor. (Unsupported in CVA6 yet) 
- 
Connect the coprocessor to memory via the CVA6 Load and Store Unit. (Unsupported in CVA6 yet) 
The coprocessor operates like another functional unit so it is connected to the CVA6 in the execute stage.
Only the 5 mandatory interfaces from the CV-X-IF specification (compressed, issue, register, commit and result ) have been implemented. Memory Interface and Memory result interface are optional interfaces, they are not implemented in the CVA6.
3.6.1.2. Supported Parameters
The following table presents CVXIF parameters supported by CVA6.
| Signal | Value | Description | 
|---|---|---|
| X_NUM_RS | int: 2 or 3 (configurable) 
 | Number of register file read ports that can be used by the eXtension interface | 
| X_ID_WIDTH | int: 1 to 32 
 | Identification width for the eXtension interface | 
| X_MEM_WIDTH | n/a (feature not supported) | Memory access width for loads/stores via the eXtension interface | 
| X_RFR_WIDTH | int:  
 | Register file read access width for the eXtension interface | 
| X_RFW_WIDTH | int:  
 | Register file write access width for the eXtension interface | 
| X_MISA | logic[31:0]: 0x0000_0000 | MISA extensions implemented on the eXtension interface | 
3.6.1.3. CV-X-IF Enabling
CV-X-IF can be enabled or disabled via the CVA6ConfigCvxifEn parameter in the SystemVerilog source code.
Never let the CV-X-IF interface unconnected with the CVA6ConfigCvxifEn parameter enabled.
3.6.1.4. Illegal instruction decoding
The CVA6 decoder module detects illegal instructions for the CVA6, prepares exception field with relevant information (exception code "ILLEGAL INSTRUCTION", instruction value).
The exception valid flag is raised in CVA6 decoder when CV-X-IF is disabled. Otherwise it is not raised at this stage because the decision belongs to the coprocessor after the offload process.
3.6.1.5. RS3 support
The number of source registers used by the CV-X-IF coprocessor is configurable with 2 or 3 source registers.
If CV-X-IF is enabled and configured with 3 source registers, a third read port is added to the CVA6 general purpose register file.
3.6.1.6. Description of interface connections between CVA6 and Coprocessor
In CVA6 execute stage, there is a new functional unit dedicated to drive the CV-X-IF interfaces. Here is how and to what CV-X-IF interfaces are connected to the CVA6.
- 
Compressed interface: - 
Request: - 
Undecoded illegal instruction is connected to compressed_req.instruction
- 
Illegal compressed signal calculated from all compressed decoders (RVC, ZCMP, ZCMT) is connected to compressed_req.valid.
 
- 
- 
Response: - 
if compressed_resp.acceptis set during a transaction (i.e.compressed_validandcompressed_readyare set), the offloaded compressed instruction is accepted by the coprocessor andcompressed_resp.instructioncontains its 32 bits equivalent for the coprocessor and sent to the 32 bits decoder.
- 
if compressed_resp.acceptis not set during a transaction, the instruction is an illegal compressed instruction and will follow its path in the decode stage. Typically, it will try to be decoded as a 32 bits instruction.
 
- 
 
- 
- 
Issue and register interface (Coupled): - 
Request: - 
Operands are connected to register.rssignals
- 
Scoreboard transaction id is connected to issue_req.idsignal. Therefore scoreboard ids and offloaded instruction ids are linked together (equal in this implementation). It allows the CVA6 to do out of order execution with the coprocessor in the same way as other functional units.
- 
Undecoded instruction is connected to issue_req.instruction
- 
Valid signal for CVXIF functional unit is connected to issue_req.valid
- 
Each register.rs_validsignal is set when its associated stall signal is not set.
- 
register_validis connected toissue_validbecause Issue and Register interface are coupled. (X_ISSUE_REGISTER_SPLIT == 0)
 
- 
- 
Response: - 
If issue_resp.acceptis set during a transaction (i.e. issue valid and ready are set), the offloaded instruction is accepted by the coprocessor and a result transaction will happen.
- 
If issue_resp.acceptis not set during a transaction, the offloaded instruction is illegal and an illegal instruction exception will be raised as soon as no result transaction are written on the writeback bus.
 
- 
 
- 
- 
Commit interface: - 
Valid signal of commit interface is connected to the valid signal of issue interface. 
- 
Id signal of commit interface is connected to issue interface id signal (i.e. scoreboard id). 
- 
Killing of offload instruction is never set. (Unsupported feature) 
- 
Therefore all accepted offloaded instructions are commited to their execution and no killing of instruction is possible in this implementation. 
 
- 
- 
Result interface: - 
Request: - 
Ready signal of result interface is always set as CVA6 is always ready to take a result from coprocessor for an accepted offloaded instruction. 
 
- 
- 
Response: - 
Result response is directly connected to writeback bus of the CV-X-IF functionnal unit. 
- 
Valid signal of result interface is connected to valid signal of writeback bus. 
- 
Id signal of result interface is connected to scoreboard id of writeback bus. 
- 
Write enable signal of result interface is connected to a dedicated CV-X-IF WE signal in CVA6 which signals scoreboard if a writeback should happen or not to the CVA6 register file. 
- 
exccodeandexcsignal of result interface are connected to exception signals of writeback bus. Exception from coprocessor does not write thetvalfield in exception signal of writeback bus.
- 
Three registers are added to hold illegal instruction information in case a result transaction and a non-accepted issue transaction happen in the same cycle. Result transactions will be written to the writeback bus in this case having priority over the non-accepted instruction due to being linked to an older offloaded instruction. Once the writeback bus is free, an illegal instruction exception will be raised thanks to information held in these three registers. 
 
- 
 
- 
3.6.2. Coprocessor recommendations for use with CVA6’s CV-X-IF
CVA6 supports all coprocessors supporting the CV-X-IF specification with the exception of :
- 
Coprocessor requiring the Memory interface and Memory result interface (not implemented in CVA6 yet). - 
All memory transaction should happen via the Issue interface, i.e. Load into CVA6 register file then initialize an issue transaction. 
 
- 
- 
Stateful coprocessors. - 
CVA6 will commit on the Commit interface all its issue transactions. Speculation informations are only kept in the CVA6 and speculation process is only done in CVA6. The coprocessor shall be stateless otherwise it will not be able to revert its state if CVA6 kills an in-flight instruction (in case of mispredict or flush). 
 
- 
4. Architecture and Modules
The CV32A60X is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.
For ASIC synthesis, the whole design is completely synchronous and uses positive-edge triggered flip-flops. The core occupies an area of about 80 kGE. The clock frequency can be more than 1 GHz depending of technology.
The CV32A60X subsystem is composed of 8 modules.
Connections between modules are illustrated in the following block diagram. FRONTEND, DECODE, ISSUE, EXECUTE, COMMIT and CONTROLLER are part of the pipeline. The CSRFILE contains the registers.
4.1. FRONTEND Module
4.1.1. Description
The FRONTEND module implements two first stages of the CVA6 pipeline, PC gen and Fetch stages.
PC gen stage is responsible for generating the next program counter. It hosts a Branch Target Buffer (BTB), a Branch History Table (BHT) and a Return Address Stack (RAS) to speculate on control flow instructions.
Fetch stage requests data to the CACHE module, realigns the data to store them in instruction queue and transmits the instructions to the DECODE module. FRONTEND can fetch up to 2 instructions per cycle, but DECODE module decodes up to 1 instruction(s) per cycle.
The module is connected to:
- 
CACHES module provides fethed instructions to FRONTEND. 
- 
DECODE module receives instructions from FRONTEND. 
- 
CONTROLLER module can order to flush and to halt FRONTEND PC gen stage 
- 
EXECUTE, CONTROLLER, CSR and COMMIT modules trigger PC jumping due to a branch misprediction, an exception, a return from an exception, a debug entry or a pipeline flush. They provides the PC next value. 
- 
CSR module states about debug mode. 
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Next PC when reset | SUBSYSTEM | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Flush requested by FENCE, mis-predict and exception | CONTROLLER | logic | 
| 
 | in | Halt requested by WFI and Accelerate port | CONTROLLER | logic | 
| 
 | in | Set COMMIT PC as next PC requested by FENCE, CSR side-effect and Accelerate port | CONTROLLER | logic | 
| 
 | in | COMMIT PC | COMMIT | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Exception event | COMMIT | logic | 
| 
 | in | Mispredict event and next PC | EXECUTE | bp_resolve_t | 
| 
 | in | Return from exception event | CSR | logic | 
| 
 | in | Next PC when returning from exception | CSR | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Next PC when jumping into exception | CSR | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | address translation request chanel | EXECUTE | fetch_arsp_t | 
| 
 | out | address translation response chanel | EXECUTE | fetch_areq_t | 
| 
 | out | OBI Fetch Request channel | CACHES | obi_fetch_req_t | 
| 
 | in | OBI Fetch Response channel | CACHES | obi_fetch_rsp_t | 
| 
 | out | Handshake’s data between fetch and decode | ID_STAGE | fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s valid between fetch and decode | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Handshake’s ready between fetch and decode | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- For any HW configuration,
- 
- 
flush_bp_iinput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
set_debug_pc_iinput is tied to 0
- 
debug_mode_iinput is tied to 0
 
- 
- As PipelineOnly = True,
- 
- 
fetch_req_ooutput is tied to 0
- 
fetch_rsp_iinput is tied to 0
 
- 
4.1.2. Functionality
4.1.3. PC Generation stage
PC gen generates the next program counter. The next PC can originate from the following sources (listed in order of precedence):
- 
Reset state: At reset, the PC is assigned to the boot address. 
- 
Branch Prediction: The fetched instruction is predecoded by the instr_scan submodule. When the instruction is a control flow, three cases are considered: - 
When the instruction is a JALR which corresponds to a return (rs1 = x1 or rs1 = x5). RAS provides next PC as a prediction. 
- 
When the instruction is a JALR which does not correspond to areturn. If BTB (Branch Target Buffer) returns a valid address, then BTBpredicts next PC. Else JALR is not considered as a control flow instruction, which will generate a mispredict. 
- 
When the instruction is a conditional branch. If BHT (Branch History table) returns a valid address, then BHT predicts next PC. Else the prediction depends on the PC relative jump offset sign: if sign is negative the prediction is taken, otherwise the prediction is not taken. 
 Then the PC gen informs the Fetch stage that it performed a prediction on the PC. 
- 
- 
Default: The next 32-bit block is fetched. PC Gen fetches word boundary 32-bits block from CACHES module. And the fetch stage identifies the instructions from the 32-bits blocks. 
- 
Mispredict: Misprediction are feedbacked by EX_STAGE module. In any case we need to correct our action and start fetching from the correct address. 
- 
Replay instruction fetch: When the instruction queue is full, the instr_queue submodule asks the fetch replay and provides the address to be replayed. 
- 
Return from environment call: When CSR requests a return from an environment call, next PC takes the value of the PC of the instruction after the one pointed to by the mepc CSR. 
- 
Exception/Interrupt: If an exception is triggered by CSR_REGISTER, next PC takes the value of the trap vector base address CSR. 
- 
Pipeline starting fetching from COMMIT PC: When the commit stage is halted by a WFI instruction or when the pipeline has been flushed due to CSR change, next PC takes the value of the PC coming from the COMMIT submodule. As CSR instructions do not exist in a compressed form, PC is unconditionally incremented by 4. 
All program counters are logical addressed.
4.1.4. Fetch Stage
Fetch stage controls the CACHE module by a handshaking protocol. Fetched data is a 32-bit block with a word-aligned address. A granted fetch is processed by the instr_realign submodule to produce instructions. Then instructions are pushed into an internal instruction FIFO called instruction queue (instr_queue submodule). This submodule stores the instructions with its associated address and sends them to the DECODE module.
Before sending the instructions to the DECODE stage, the frontend calculates a prediction address in case of a JUMP or Branch. This predicted address is sent to the DECODE stage along with the instruction and its fetch address. The prediction address is not valid if there is no prediction. Instructions following a predicted taken control flow instruction are dropped.
Memory can feedback potential exceptions which can be bus errors, invalid accesses or instruction page faults. The FRONTEND transmits the exception from CACHES to DECODE.
4.1.5. Submodules
4.1.5.1. Instr_realign submodule
The 32-bit aligned block coming from the CACHE module enters the instr_realign submodule. This submodule extracts the instructions from the 32-bit blocks. Based on the fetch address and the fetched data, the instr_realign module extracts the valid instructions to be sent to the queue. It is possible to fetch up to 2 instructions per cycle when C extension is used. A not-compressed instruction can be misaligned on the block size, interleaved with two cache blocks. In that case, two cache accesses are needed to get the whole instruction. The instr_realign submodule provides up to 2 instructions per cycle when compressed extension is enabled, else one instruction per cycle. Incomplete instruction is stored in instr_realign submodule until its second half is fetched.
Below is a table that explains how the instr_realign works:
- 
C: compressed instruction 
- 
I: not compressed instruction 
- 
UI: Incomplete instruction stored in the instr_realign 
The Instr_realign can be flushed when the frontend requests the cache to kill the incoming instruction, in this case the incomplete instruction is deleted.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Fetch flush request | CONTROLLER | logic | 
| 
 | in | 32-bit block is valid | CACHE | logic | 
| 
 | out | Instruction is unaligned | FRONTEND | logic | 
| 
 | in | 32-bit block address | CACHE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | 32-bit block | CACHE | logic[CVA6Cfg.FETCH_WIDTH-1:0] | 
| 
 | out | instruction is valid | FRONTEND | logic[CVA6Cfg.INSTR_PER_FETCH-1:0] | 
| 
 | out | Instruction address | FRONTEND | logic[CVA6Cfg.INSTR_PER_FETCH-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | out | Instruction | instr_scan&instr_queue | logic[CVA6Cfg.INSTR_PER_FETCH-1:0][31:0] | 
4.1.5.2. Instr_queue submodule
The instr_queue receives mutliple instructions from instr_realign submodule to create a valid stream to be executed. Frontend pushes instructions and all related information into the FIFO for storage, including details needed in case of a misprediction or exception: the instructions themselves, instruction control flow type, exception, exception address, and predicted address. DECODE pops them when decode stage is ready and indicates to the FRONTEND the instruction has been consummed.
The instruction queue contains two FIFOs: one for instructions and one for addresses, which stores addresses in case of a prediction. The instruction FIFO can hold up to 4×2 instructions, while the address FIFO can hold up to 2 addresses. If the instruction FIFO is full, a replay request is sent to inform the fetch mechanism to replay the fetch. If the address FIFO is full and there is a prediction, a replay request is sent to inform the fetch mechanism to replay the fetch, even if the instruction FIFO is not full.
The instruction queue can be flushed by the flush signal coming from the CONTROLLER.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Fetch flush request | CONTROLLER | logic | 
| 
 | in | Instruction | instr_realign | logic[CVA6Cfg.INSTR_PER_FETCH-1:0][31:0] | 
| 
 | in | Instruction address | instr_realign | logic[CVA6Cfg.INSTR_PER_FETCH-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | in | Instruction is valid | instr_realign | logic[CVA6Cfg.INSTR_PER_FETCH-1:0] | 
| 
 | out | Handshake’s ready with CACHE | CACHE | logic | 
| 
 | out | Indicates instructions consummed, or popped by ID_STAGE | FRONTEND | logic[CVA6Cfg.INSTR_PER_FETCH-1:0] | 
| 
 | in | Exception (which is page-table fault) | CACHE | ariane_pkg::frontend_exception_t | 
| 
 | in | Exception address | CACHE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Branch predict | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Instruction predict address | FRONTEND | ariane_pkg::cf_t[CVA6Cfg.INSTR_PER_FETCH-1:0] | 
| 
 | out | Replay instruction because one of the FIFO was full | FRONTEND | logic | 
| 
 | out | Address at which to replay the fetch | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Handshake’s data with ID_STAGE | ID_STAGE | fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s valid with ID_STAGE | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Handshake’s ready with ID_STAGE | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVH = False,
- 
- 
exception_gpaddr_iinput is tied to 0
- 
exception_tinst_iinput is tied to 0
- 
exception_gva_iinput is tied to 0
 
- 
4.1.5.3. Instr_scan submodule
As compressed extension is enabled, 2 instr_scan are instantiated to handle up to 2 instructions per cycle.
Each instr_scan submodule pre-decodes the fetched instructions coming from the instr_realign module, also calculate the immediate, instructions could be compressed or not. The instr_scan submodule is a flow controller which provides the intruction type: branch, jump, return, jalr, imm, call or others. These outputs are used by the branch prediction feature.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Instruction to be predecoded | instr_realign | logic[31:0] | 
| 
 | out | Return instruction | FRONTEND | logic | 
| 
 | out | JAL instruction | FRONTEND | logic | 
| 
 | out | Branch instruction | FRONTEND | logic | 
| 
 | out | JALR instruction | FRONTEND | logic | 
| 
 | out | Unconditional jump instruction | FRONTEND | logic | 
| 
 | out | Instruction immediate | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Branch compressed instruction | FRONTEND | logic | 
| 
 | out | Unconditional jump compressed instruction | FRONTEND | logic | 
| 
 | out | JR compressed instruction | FRONTEND | logic | 
| 
 | out | Return compressed instruction | FRONTEND | logic | 
| 
 | out | JALR compressed instruction | FRONTEND | logic | 
| 
 | out | JAL compressed instruction | FRONTEND | logic | 
| 
 | out | Instruction compressed immediate | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
4.1.5.4. BHT (Branch History Table) submodule
The BHT is implemented as a memory which is composed of 32 entries. The BHT is a two-dimensional table:
- 
The first dimension represents the access address, with a length equal to 32 / 2.
- 
The second dimension represents the row index, with a length equal to 2.
In the case of branch prediction, the BHT uses only part of the virtual address to get the value of the saturation counter. In the case of a valid misprediction, the BHT uses only part of the misprediction address to access the BHT table and update the saturation counter.
UPPER_ADDRESS_INDEX = $clog2(BHTDepth) + ((RVC == 1) ? 1 : 2)
LOWER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 + $clog2(INSTR_PER_FETCH)
ACCESS_ADDRESS = PC/MISPREDICT_ADDRESS [ UPPER_ADDRESS_INDEX : LOWER_ADDRESS_INDEX ]
The lower address bits of the virtual address point to the memory entry.
UPPER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 +  $clog2(INSTR_PER_FETCH)
LOWER_ADDRESS_INDEX = (RVC == 1) ? 1 : 2 +    $clog2(INSTR_PER_FETCH)
ACCESS_INDEX = PC/MISPREDICT_ADDRESS [ UPPER_ADDRESS_INDEX : LOWER_ADDRESS_INDEX]
Two distinct branches with different addresses can share the same BHT entry if they have the same ACCESS_ADDRESS.
Each BHT entry contains a two-bit saturating counter and a valid bit. On reset, the counters are set to 0 (strongly not taken) and the valid bits are cleared. When a branch instruction is resolved by the EX_STAGE module, the valid bit is set and the counter is updated. The two bit counter is updated by the successive execution of the instructions as shown in the following figure.
When a branch instruction is pre-decoded by instr_scan submodule, the BHT valids whether the PC address is inside the BHT and provides the taken or not prediction. The prediction is the most significant bit from the counter, where 1 means "taken".
When the Execute stage processes a branch instruction, it sends the branch status (whether it’s taken or not) to the Frontend to update the BHT table
The BHT is never flushed.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Virtual PC | CACHE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Update bht with resolved address | EXECUTE | bht_update_t | 
| 
 | out | Prediction from bht | FRONTEND | ariane_pkg::bht_prediction_t[CVA6Cfg.INSTR_PER_FETCH-1:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- For any HW configuration,
- 
- 
flush_bp_iinput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
debug_mode_iinput is tied to 0
 
- 
4.1.5.5. BTB (Branch Target Buffer) submodule
There is no BTB in CV32A60X. As a consequence, no valid address is returned from BTB.
4.1.5.6. RAS (Return Address Stack) submodule
RAS is implemented as a LIFO which is composed of 2 entries.
When a "call" JAL instruction (rd = x1 or x5) is added to the instruction queue, the PC of the instruction following the JAL instruction is pushed into the stack.
When a "ret" JALR instruction (rs1 = x1 or x5, and rd != rs1) is added to the instruction queue, the predicted return address is popped from the stack. If the predicted return address is wrong due for instance to speculation or RAS depth limitation, a mispredict will be generated.
The RAS is never flushed.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Push address in RAS | FRONTEND | logic | 
| 
 | in | Pop address from RAS | FRONTEND | logic | 
| 
 | in | Data to be pushed | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Popped data | FRONTEND | ras_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- For any HW configuration,
- 
- 
flush_bp_iinput is tied to 0
 
- 
4.2. ID_STAGE Module
4.2.1. Description
The ID_STAGE module implements the decode stage of the pipeline. Its main purpose is to decode RISC-V instructions coming from FRONTEND module (fetch stage) and send them to the ISSUE_STAGE module (issue stage).
The compressed_decoder module checks whether the incoming instruction is compressed and output the corresponding uncompressed instruction. Then the decoder module decodes the instruction and send it to the issue stage.
The module is connected to:
- 
CONTROLLER module can request to flush the buffer at the end of ID_STAGE 
- 
FRONTEND module sends instructions to ID_STAGE module 
- 
ISSUE module receives the decoded instructions from ID_STAGE module 
- 
CSR_REGFILE module sends status information about privilege mode, traps, extension support. 
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Fetch flush request | CONTROLLER | logic | 
| 
 | in | Handshake’s data between fetch and decode | FRONTEND | fetch_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Handshake’s valid between fetch and decode | FRONTEND | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s ready between fetch and decode | FRONTEND | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s data between decode and issue | ISSUE | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | none | none | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Instruction value | ISSUE | logic[CVA6Cfg.NrIssuePorts-1:0][31:0] | 
| 
 | out | Handshake’s valid between decode and issue | ISSUE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Report if instruction is a control flow instruction | ISSUE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Handshake’s acknowlege between decode and issue | ISSUE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Level sensitive (async) interrupts | SUBSYSTEM | logic[1:0] | 
| 
 | in | Interrupt control status | CSR_REGFILE | irq_ctrl_t | 
| 
 | in | none | none | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | none | none | logic | 
| 
 | in | none | none | jvt_t | 
| 
 | in | none | none | x_compressed_resp_t | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | x_compressed_req_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As DebugEn = False,
- 
- 
debug_req_iinput is tied to 0
- 
debug_mode_iinput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
was_compressed_ooutput is tied to 0
 
- 
- As PRIV = MachineOnly,
- 
- 
priv_lvl_iinput is tied to MachineMode
- 
tvm_iinput is tied to 0
- 
tw_iinput is tied to 0
- 
tsr_iinput is tied to 0
 
- 
- As RVH = False,
- 
- 
v_iinput is tied to 0
- 
vfs_iinput is tied to 0
- 
vtw_iinput is tied to 0
- 
hu_iinput is tied to 0
 
- 
- As RVF = 0,
- 
- 
fs_iinput is tied to 0
- 
frm_iinput is tied to 0
 
- 
- As RVV = False,
- 
- 
vs_iinput is tied to 0
 
- 
- As RVZCMT = False,
- 
- 
obi_zcmt_req_ooutput is tied to 0
- 
obi_zcmt_rsp_iinput is tied to 0
 
- 
4.2.2. Functionality
ID_STAGE transforms each instruction into a scoreboard entry whose fields indicate what the instruction does. It receives external interrupts and, according to the interrupts configuration from CSR_REGFILE, it inserts exceptions in the pipeline.
4.2.3. Submodules
4.2.3.1. Compressed_decoder
The compressed_decoder module decompresses all the compressed instructions taking a 16-bit compressed instruction and expanding it to its 32-bit equivalent. All compressed instructions have a 32-bit equivalent.
Non-compressed instructions on the input are transmitted as-is.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Input instruction coming from fetch stage | FRONTEND | logic[31:0] | 
| 
 | out | Output instruction in uncompressed format | decoder | logic[31:0] | 
| 
 | out | Input instruction is illegal | decoder | logic | 
| 
 | out | Output instruction is macro | decoder | logic | 
| 
 | out | Output instruction is compressed | decoder | logic | 
| 
 | out | Output instruction is macro | decoder | logic | 
4.2.3.2. Decoder
The decoder module takes the output of compressed_decoder module and decodes it. It transforms the instruction to the most fundamental control structure in pipeline, a scoreboard entry.
The scoreboard entry contains an exception entry which is composed of a valid field, a cause and a value called TVAL. As TVALEn configuration parameter is zero, the TVAL field is not implemented.
A potential illegal instruction exception can be detected during decoding. If no exception has happened previously in fetch stage, the decoder will valid the exception and add the cause and tval value to the scoreboard entry.
A potential interrupt can be sent to the decoder. If no exception has happened previously in fetch stage, the exception is inserted to into the scoreboard entry.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | PC from fetch stage | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | none | none | logic | 
| 
 | in | Compressed form of instruction | FRONTEND | logic[15:0] | 
| 
 | in | Illegal compressed instruction | compressed_decoder | logic | 
| 
 | in | Instruction from fetch stage | FRONTEND | logic[31:0] | 
| 
 | in | Is a macro instruction | macro_decoder | logic | 
| 
 | in | Is a last macro instruction | macro_decoder | logic | 
| 
 | in | Is mvsa01/mva01s macro instruction | macro_decoder | logic | 
| 
 | in | Zcmt instruction | FRONTEND | logic | 
| 
 | in | Jump address | zcmt_decoder | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | Is a branch predict instruction | FRONTEND | branchpredict_sbe_t | 
| 
 | in | If an exception occured in fetch stage | FRONTEND | exception_t | 
| 
 | in | Level sensitive (async) interrupts | SUBSYSTEM | logic[1:0] | 
| 
 | in | Interrupt control status | CSR_REGFILE | irq_ctrl_t | 
| 
 | out | Instruction to be added to scoreboard entry | ISSUE_STAGE | scoreboard_entry_t | 
| 
 | out | Instruction | ISSUE_STAGE | logic[31:0] | 
| 
 | out | Is a control flow instruction | ISSUE_STAGE | logic | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As DebugEn = False,
- 
- 
debug_req_iinput is tied to 0
- 
debug_mode_iinput is tied to 0
 
- 
- As PRIV = MachineOnly,
- 
- 
priv_lvl_iinput is tied to MachineMode
- 
tvm_iinput is tied to 0
- 
tw_iinput is tied to 0
- 
tsr_iinput is tied to 0
 
- 
- As RVH = False,
- 
- 
v_iinput is tied to 0
- 
vfs_iinput is tied to 0
- 
vtw_iinput is tied to 0
- 
hu_iinput is tied to 0
 
- 
- As RVF = 0,
- 
- 
fs_iinput is tied to 0
- 
frm_iinput is tied to 0
 
- 
- As RVV = False,
- 
- 
vs_iinput is tied to 0
 
- 
4.3. ISSUE_STAGE Module
4.3.1. Description
ISSUE_STAGE issues instructions (1), reorders their results (2) and sends completed instructions in-order to COMMIT_STAGE (3).
(1) ISSUE_STAGE issues instructions in-order. It makes sure that instructions from ID_STAGE have everything they need to run. It waits until all requirements are met. Once an instruction is ready to run, ISSUE_STAGE sends it to EX_STAGE with its operands.
(2) ISSUE_STAGE reorders instructions results. It gets results of instruction executions out-of-order from EX_STAGE. ISSUE_STAGE stores these results reordered.
(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. This is where architectural state is modified.
Scoreboard module keeps track of instructions and their results. Issue_read_operands module contains all the issue logic and the register file.
The module is connected to:
- 
CONTROLLER module can request to flush the pipeline buffer at the end of ISSUE_STAGE. CONTROLLER module can also request to flush the whole Scoreboard. 
- 
ID_STAGE module delivers decoded instructions to ISSUE_STAGE. 
- 
EX_STAGE module gets instructions issued by ISSUE_STAGE to execute them. EX_STAGE module also returns results to ISSUE_STAGE. 
- 
COMMIT_STAGE module delivers ISSUE_STAGE clearance to remove the oldest instruction from Scoreboard. 
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Prevent from issuing | CONTROLLER | logic | 
| 
 | in | Flush whole scoreboard | CONTROLLER | logic | 
| 
 | in | Handshake’s data with decode stage | ID_STAGE | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | none | none | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | instruction value | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0][31:0] | 
| 
 | in | Handshake’s valid with decode stage | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Is instruction a control flow instruction | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s acknowlege with decode stage | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | rs1 forwarding | EX_STAGE | [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | out | rs2 forwarding | EX_STAGE | [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | out | FU data useful to execute instruction | EX_STAGE | fu_data_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Program Counter | EX_STAGE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Is zcmt instruction | EX_STAGE | logic | 
| 
 | out | Is compressed instruction | EX_STAGE | logic | 
| 
 | in | Fixed Latency Unit is ready | EX_STAGE | logic | 
| 
 | out | ALU output is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Branch unit is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Information of branch prediction | EX_STAGE | branchpredict_sbe_t | 
| 
 | in | Signaling that we resolved the branch | EX_STAGE | logic | 
| 
 | in | Load store unit FU is ready | EX_STAGE | logic | 
| 
 | out | Load store unit FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Mult FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | ALU2 FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | CSR is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | CVXIF FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | CVXIF is FU ready | EX_STAGE | logic | 
| 
 | out | CVXIF offloader instruction value | EX_STAGE | logic[31:0] | 
| 
 | in | CVA6 Hart ID | SUBSYSTEM | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | CVXIF Issue interface | EX_STAGE | logic | 
| 
 | in | CVXIF Issue interface response | EX_STAGE | x_issue_resp_t | 
| 
 | out | CVXIF Issue interface valid | EX_STAGE | logic | 
| 
 | out | CVXIF Issue interface request | EX_STAGE | x_issue_req_t | 
| 
 | in | CVXIF Register interface | EX_STAGE | logic | 
| 
 | out | CVXIF register interface valid | EX_STAGE | logic | 
| 
 | out | CVXIF register interface | EX_STAGE | x_register_t | 
| 
 | out | CVXIF Commit interface | EX_STAGE | logic | 
| 
 | out | CVXIF Commit interface | EX_STAGE | x_commit_t | 
| 
 | out | CVXIF Transaction rejected → instruction is illegal | EX_STAGE | logic | 
| 
 | in | Transaction ID | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Result from branch unit | EX_STAGE | bp_resolve_t | 
| 
 | in | Results to write back | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.XLEN-1:0] | 
| 
 | in | exception from execute stage or CVXIF | EX_STAGE | exception_t[CVA6Cfg.NrWbPorts-1:0] | 
| 
 | in | Indicates valid results | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0] | 
| 
 | in | CVXIF write enable | EX_STAGE | logic | 
| 
 | in | CVXIF destination register | EX_STAGE | logic[4:0] | 
| 
 | in | Destination register in register file | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][4:0] | 
| 
 | in | Value to write to register file | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0] | 
| 
 | in | GPR write enable | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Instructions to commit | COMMIT_STAGE | scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Instruction is cancelled | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | in | Commit acknowledge | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As PerfCounterEn = 0,
- 
- 
sb_full_ooutput is tied to 0
- 
stall_issue_ooutput is tied to 0
 
- 
- As EnableAccelerator = 0,
- 
- 
stall_iinput is tied to 0
- 
issue_instr_ooutput is tied to 0
- 
issue_instr_hs_ooutput is tied to 0
 
- 
- As RVH = False,
- 
- 
tinst_ooutput is tied to 0
 
- 
- As RVF = 0,
- 
- 
fpu_ready_iinput is tied to 0
- 
fpu_valid_ooutput is tied to 0
- 
fpu_fmt_ooutput is tied to 0
- 
fpu_rm_ooutput is tied to 0
- 
we_fpr_iinput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_issue_pointer_ooutput is tied to 0
- 
rvfi_commit_pointer_ooutput is tied to 0
- 
rvfi_rs1_ooutput is tied to 0
- 
rvfi_rs2_ooutput is tied to 0
 
- 
4.3.2. Functionality
ISSUE_STAGE has three functionalities.
(1) ISSUE_STAGE issues instructions. Instructions from ID_STAGE are sent to Scoreboard module, which forwards them to Issue_read_operands module. Issue_read_operands queries Scoreboard module for data dependences (Scoreboard is also able to return forwarded values) and gets the list of busy functional units from EX_STAGE. Issue_read_operands sends to EX_STAGE the instructions to execute and acknowledges to Scoreboard so that it stores them: the instruction is issued. Issued instructions are acknowledged to ID_STAGE. Each of these steps can block its successors. The flush signal from CONTROLLER module is also sent to Scoreboard module to prevent from issuing. Instructions are issued in-order: an instruction cannot be issued unless all its predecessors are issued.
(2) ISSUE_STAGE reorders instructions results. Results from EX_STAGE are sent to Scoreboard module so that they are stored.
(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. The oldest instructions from Scoreboard are exposed to COMMIT_STAGE. When COMMIT_STAGE acknowledges a commit, the committed instruction is removed from Scoreboard and the register file in Issue_read_operands is updated with the instruction result.
4.3.3. Submodules
4.3.3.1. Scoreboard
Scoreboard contains a FIFO which contains an entry for each issued instruction. Each entry is removed once the instruction is committed. Instruction results are inserted into Scoreboard when they are ready. The FIFO is flushed when requested by CONTROLLER.
Scoreboard is used in all three functionalities of ISSUE_STAGE.
(1) ISSUE_STAGE issues instructions. Up to 1 instruction(s) can be received from ID_STAGE each cycle. They are transmitted to Issue_read_operands with incremental transaction IDs which wrap at 4. The result buses and Scoreboard entries are also transmitted to Issue_read_operands for it to detect data dependences and perform operand forwarding. When Issue_read_operands acknowledges an instruction, it is inserted into the FIFO.
Scoreboard has a capacity of 4 entries. Instructions which would make Scoreboard overflow are not transmitted to Issue_read_operands (ISSUE_STAGE stalls).
The flush signal from CONTROLLER module removes all entries from the Scoreboard and prevents from issuing. The transaction ID of the next issued instruction is 0.
(2) ISSUE_STAGE reorders instructions results. Results are returned from functional units in the EX_STAGE to Scoreboard via result buses, with their transaction IDs. Scoreboard stores this result into the entry associated with this transaction ID. If an exception is returned, it is stored too.
FIXME Document behavior related to CV-X-IF
(3) ISSUE_STAGE sends completed instructions in-order to COMMIT_STAGE. Each of the 1 oldest entry(ies) in Scoreboard are exposed to COMMIT_STAGE, one per commit port. This makes commit happen in-order. When COMMIT_STAGE acknowledges on a commit port, the entry is removed from Scoreboard.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Prevent from issuing | CONTROLLER | logic | 
| 
 | in | Flush whole scoreboard | CONTROLLER | logic | 
| 
 | in | CVXIF trasaction accepted | ISSUE_READ_OPERANDS | logic | 
| 
 | in | CVXIF Issue writeback | ISSUE_READ_OPERANDS | logic | 
| 
 | in | CVXIF ID | ISSUE_READ_OPERANDS | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Instructions to commit | COMMIT_STAGE | scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Instruction is cancelled | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | in | Commit acknowledge | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | in | Handshake’s data with decode stage | ID_STAGE | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | instruction value | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0][31:0] | 
| 
 | in | Handshake’s valid with decode stage | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Handshake’s acknowlege with decode stage | ID_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Entry about the instruction to issue | ISSUE_READ_OPERANDS | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Instruction to issue | ISSUE_READ_OPERANDS | logic[CVA6Cfg.NrIssuePorts-1:0][31:0] | 
| 
 | out | Is there an instruction to issue | ISSUE_READ_OPERANDS | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Issue stage acknowledge | ISSUE_READ_OPERANDS | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Forwarding | ISSUE_READ_OPERANDS | forwarding_t | 
| 
 | in | Result from branch unit | EX_STAGE | bp_resolve_t | 
| 
 | in | Transaction ID at which to write the result back | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Results to write back | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0][CVA6Cfg.XLEN-1:0] | 
| 
 | in | Exception from a functional unit (e.g.: ld/st exception) | EX_STAGE | exception_t[CVA6Cfg.NrWbPorts-1:0] | 
| 
 | in | Indicates valid results | EX_STAGE | logic[CVA6Cfg.NrWbPorts-1:0] | 
| 
 | in | Cvxif we for writeback | EX_STAGE | logic | 
| 
 | in | CVXIF destination register | ISSUE_STAGE | logic[4:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As PerfCounterEn = 0,
- 
- 
sb_full_ooutput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_issue_pointer_ooutput is tied to 0
- 
rvfi_commit_pointer_ooutput is tied to 0
 
- 
4.3.3.2. Issue_read_operands
Issue_read_operands tracks hazards and gets the input operands for the instructions to execute. The following hazards can prevent instructions from being issued.
- 
Data hazards: ISSUE_STAGE checks that the instruction operands are available. - 
Read-After-Write (RAW): if one of the source registers of the instruction to issue is the destination register of one of the instructions in the Scoreboard, issue is blocked. However, CVA6 implements operand forwarding: instead of blocking the instruction, the operand is taken from either 
 a) a functional unit which returns a result which is not an exception, with a transaction ID which points to a Scoreboard entry whose destination register is the requested source register;
 b) a Scoreboard entry whose destination register is the requested source register has a result which is not an exception.
 Forwarding is not possible from CSR instructions.
- 
Write-After-Write (WAW): if the instruction to issue has the same destination register as one of the instructions in the Scoreboard, issue is blocked. Instructions being committed are ignored because the will not be in the scoreboard anymore since the next cycle. 
- 
Special case: there are no data hazards on x0.
- 
FIXME hazards related to CV-X-IF 
 
- 
- 
Structural hazards: ISSUE_STAGE checks that a functional unit (FU) and its result bus (RB) are ready to execute the instruction. - 
Integer division instructions and some[FIXME which?] CSR instructions have an unknown latency. When EX_STAGE reports that such an instruction is running, instructions using ALU, BRANCH, CSR or MULT are blocked. This is to avoid conflicts on the RB shared by these four FUs. 
- 
Multiplications have a fixed latency of 2 cycles. Instructions using ALU, BRANCH or CSR are blocked if an instruction using MULT was issued one cycle earlier. This is to avoid conflicts on the RB shared by these four FUs. Instructions using MULT are not blocked because the multiplier is pipelined and can accept one instruction each cycle. 
- 
Instructions using LSU are blocked if LSU is not ready. 
 
- 
Data hazards are ignored when an exception occurred earlier in the pipeline. As no FU is involved, there are no structural hazards either.
Instructions are issued in-order, which means that when an instruction makes ISSUE_STAGE stall, next instructions are blocked.
The input operands provided to EX_STAGE come from the register file by default.
However, when one of the source registers has a RAW dependence, the corresponding input operand is replaced by the forwarded value (see Data hazards/RAW hazards above).
The register file is an instance of ariane_regfile where ach register stores 32 bits and the register at index 0 is wired to zero.
FIXME Document behavior related to CV-X-IF
Instructions are sent to EX_STAGE via a register so they are visible in EX_STAGE one cycle after being issued.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Prevent from issuing | CONTROLLER | logic | 
| 
 | in | Entry about the instruction to issue | SCOREBOARD | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | none | none | scoreboard_entry_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Instruction to issue | SCOREBOARD | logic[CVA6Cfg.NrIssuePorts-1:0][31:0] | 
| 
 | in | Is there an instruction to issue | SCOREBOARD | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Issue stage acknowledge | SCOREBOARD | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | Forwarding | SCOREBOARD | forwarding_t | 
| 
 | out | FU data useful to execute instruction | EX_STAGE | fu_data_t[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Unregistered version of fu_data_o.operanda | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | out | Unregistered version of fu_data_o.operandb | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.VLEN-1:0] | 
| 
 | out | Program Counter | EX_STAGE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Is zcmt | EX_STAGE | logic | 
| 
 | out | Is compressed instruction | EX_STAGE | logic | 
| 
 | in | Fixed Latency Unit is ready | EX_STAGE | logic | 
| 
 | out | ALU output is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Branch unit is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Information of branch prediction | EX_STAGE | branchpredict_sbe_t | 
| 
 | in | Load store unit FU is ready | EX_STAGE | logic | 
| 
 | out | Load store unit FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | Mult FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | ALU2 FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | CSR is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | out | CVXIF FU is valid | EX_STAGE | logic[CVA6Cfg.NrIssuePorts-1:0] | 
| 
 | in | CVXIF is FU ready | EX_STAGE | logic | 
| 
 | out | CVXIF offloader instruction value | EX_STAGE | logic[31:0] | 
| 
 | in | CVA6 Hart ID | SUBSYSTEM | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | none | none | logic | 
| 
 | in | none | none | x_issue_resp_t | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | x_issue_req_t | 
| 
 | in | none | none | logic | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | x_register_t | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | x_commit_t | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | logic | 
| 
 | out | none | none | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Destination register in the register file | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][4:0] | 
| 
 | in | Value to write to register file | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0] | 
| 
 | in | GPR write enable | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As EnableAccelerator = 0,
- 
- 
stall_iinput is tied to 0
 
- 
- As RVH = False,
- 
- 
tinst_ooutput is tied to 0
 
- 
- As RVF = 0,
- 
- 
fpu_ready_iinput is tied to 0
- 
fpu_valid_ooutput is tied to 0
- 
fpu_fmt_ooutput is tied to 0
- 
fpu_rm_ooutput is tied to 0
- 
we_fpr_iinput is tied to 0
 
- 
- As PerfCounterEn = 0,
- 
- 
stall_issue_ooutput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_rs1_ooutput is tied to 0
- 
rvfi_rs2_ooutput is tied to 0
 
- 
4.4. EX_STAGE Module
4.4.1. Description
The EX_STAGE module is a logical stage which implements the execute stage. It encapsulates the following functional units: ALU, Branch Unit, CSR buffer, Mult, load and store and CVXIF.
The module is connected to:
- 
ID_STAGE module provides scoreboard entry. * 
4.4.2. Functionality
4.4.3. Submodules
4.4.3.1. alu
The arithmetic logic unit (ALU) is a small piece of hardware which performs 32 and 64-bit arithmetic and bitwise operations: subtraction, addition, shifts, comparisons… It always completes its operation in a single cycle.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | FU data needed to execute instruction | ISSUE_STAGE | fu_data_t | 
| 
 | out | ALU result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | ALU branch compare result | branch_unit | logic | 
4.4.3.2. branch_unit
The branch unit module manages all kinds of control flow changes i.e.: conditional and unconditional jumps. It calculates the target address and decides whether to take the branch or not. It also decides if a branch was mis-predicted or not and reports corrective actions to the pipeline stages.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | FU data needed to execute instruction | ISSUE_STAGE | fu_data_t | 
| 
 | in | Instruction PC | ISSUE_STAGE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Is zcmt instruction | ISSUE_STAGE | logic | 
| 
 | in | Instruction is compressed | ISSUE_STAGE | logic | 
| 
 | in | Branch unit instruction is valid | ISSUE_STAGE | logic | 
| 
 | in | ALU branch compare result | ALU | logic | 
| 
 | out | Brach unit result | ISSUE_STAGE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Information of branch prediction | ISSUE_STAGE | branchpredict_sbe_t | 
| 
 | out | Signaling that we resolved the branch | ISSUE_STAGE | bp_resolve_t | 
| 
 | out | Branch is resolved, new entries can be accepted by scoreboard | ID_STAGE | logic | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVH = False,
- 
- 
v_iinput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
debug_mode_iinput is tied to 0
 
- 
- As PerfCounterEn = 0,
- 
- 
branch_exception_ooutput is tied to 0
 
- 
4.4.3.3. CSR_buffer
The CSR buffer module stores the CSR address at which the instruction is going to read/write. As the CSR instruction alters the processor architectural state, this instruction has to be buffered until the commit stage decides to execute the instruction.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush CSR | CONTROLLER | logic | 
| 
 | in | FU data needed to execute instruction | ISSUE_STAGE | fu_data_t | 
| 
 | out | CSR FU is ready | ISSUE_STAGE | logic | 
| 
 | in | CSR instruction is valid | ISSUE_STAGE | logic | 
| 
 | out | CSR buffer result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | commit the pending CSR OP | COMMIT_STAGE | logic | 
| 
 | out | CSR address to write | COMMIT_STAGE | logic[11:0] | 
4.4.3.4. mult
The multiplier module supports the division and multiplication operations.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush | CONTROLLER | logic | 
| 
 | in | FU data needed to execute instruction | ISSUE_STAGE | fu_data_t | 
| 
 | in | Mult instruction is valid | ISSUE_STAGE | logic | 
| 
 | out | Mult result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Mult result is valid | ISSUE_STAGE | logic | 
| 
 | out | Mutl is ready | ISSUE_STAGE | logic | 
| 
 | out | Mult transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
4.4.3.4.1. multiplier
Multiplication is performed in two cycles and is fully pipelined.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Multiplier transaction ID | Mult | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Multiplier instruction is valid | Mult | logic | 
| 
 | in | Multiplier operation | Mult | fu_op | 
| 
 | in | A operand | Mult | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | B operand | Mult | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Multiplier result | Mult | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Mutliplier result is valid | Mult | logic | 
| 
 | out | Multiplier transaction ID | Mult | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
4.4.3.4.2. serdiv
The division is a simple serial divider which needs 64 cycles in the worst case.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Serdiv translation ID | Mult | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | A operand | Mult | logic[WIDTH-1:0] | 
| 
 | in | B operand | Mult | logic[WIDTH-1:0] | 
| 
 | in | Serdiv operation | Mult | logic[1:0]opcode_i,//0:udiv,2:urem,1:div,3: | 
| 
 | in | Serdiv instruction is valid | Mult | logic | 
| 
 | out | Serdiv FU is ready | Mult | logic | 
| 
 | in | Flush | CONTROLLER | logic | 
| 
 | out | Serdiv result is valid | Mult | logic | 
| 
 | in | Serdiv is ready | Mult | logic | 
| 
 | out | Serdiv transaction ID | Mult | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Serdiv result | Mult | logic[WIDTH-1:0] | 
4.4.3.5. load_store_unit (LSU)
The load store module interfaces with the data cache (D$) to manage the load and store operations.
The LSU does not handle misaligned accesses. Misaligned accesses are double word accesses which are not aligned to a 64-bit boundary, word accesses which are not aligned to a 32-bit boundary and half word accesses which are not aligned on 16-bit boundary. If the LSU encounters a misaligned load or store, it throws a misaligned exception.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush | CONTROLLER | logic | 
| 
 | out | No store pending | COMMIT_STAGE | logic | 
| 
 | in | FU data needed to execute instruction | ISSUE_STAGE | fu_data_t | 
| 
 | out | Load Store Unit is ready | ISSUE_STAGE | logic | 
| 
 | in | Load Store Unit instruction is valid | ISSUE_STAGE | logic | 
| 
 | out | Load transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Load result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Load result is valid | ISSUE_STAGE | logic | 
| 
 | out | Load exception | ISSUE_STAGE | exception_t | 
| 
 | out | Store transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Store result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Store result is valid | ISSUE_STAGE | logic | 
| 
 | out | Store exception | ISSUE_STAGE | exception_t | 
| 
 | in | Commit the first pending store | COMMIT_STAGE | logic | 
| 
 | out | Commit queue is ready to accept another commit request | COMMIT_STAGE | logic | 
| 
 | in | Commit transaction ID | COMMIT_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Instruction cache input request | FETCH | fetch_areq_t | 
| 
 | out | Instruction cache output response | FETCH | fetch_arsp_t | 
| 
 | out | Store cache response | DCACHE | obi_store_req_t | 
| 
 | in | Store cache request | DCACHE | obi_store_rsp_t | 
| 
 | out | Load cache response | DCACHE | obi_load_req_t | 
| 
 | in | Load cache request | DCACHE | obi_load_rsp_t | 
| 
 | in | PMP configuration | CSR_REGFILE | riscv::pmpcfg_t[(CVA6Cfg.NrPMPEntries>0?CVA6Cfg.NrPMPEntries-1:0):0] | 
| 
 | in | PMP address | CSR_REGFILE | logic[(CVA6Cfg.NrPMPEntries>0?CVA6Cfg.NrPMPEntries-1:0):0][CVA6Cfg.PLEN-3:0] | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As EnableAccelerator = 0,
- 
- 
stall_st_pending_iinput is tied to 0
 
- 
- As RVA = False,
- 
- 
amo_valid_commit_iinput is tied to 0
- 
obi_amo_req_ooutput is tied to 0
- 
obi_amo_rsp_iinput is tied to 0
 
- 
- As RVH = False,
- 
- 
tinst_iinput is tied to 0
- 
enable_g_translation_iinput is tied to 0
- 
en_ld_st_g_translation_iinput is tied to 0
- 
v_iinput is tied to 0
- 
ld_st_v_iinput is tied to 0
- 
csr_hs_ld_st_inst_ooutput is tied to 0
- 
vs_sum_iinput is tied to 0
- 
vmxr_iinput is tied to 0
- 
vsatp_ppn_iinput is tied to 0
- 
vs_asid_iinput is tied to 0
- 
hgatp_ppn_iinput is tied to 0
- 
vmid_iinput is tied to 0
- 
vmid_to_be_flushed_iinput is tied to 0
- 
gpaddr_to_be_flushed_iinput is tied to 0
- 
flush_tlb_vvma_iinput is tied to 0
- 
flush_tlb_gvma_iinput is tied to 0
 
- 
- As RVS = False,
- 
- 
enable_translation_iinput is tied to 0
- 
en_ld_st_translation_iinput is tied to 0
- 
sum_iinput is tied to 0
- 
mxr_iinput is tied to 0
- 
satp_ppn_iinput is tied to 0
- 
asid_iinput is tied to 0
- 
asid_to_be_flushed_iinput is tied to 0
- 
vaddr_to_be_flushed_iinput is tied to 0
 
- 
- As PRIV = MachineOnly,
- 
- 
priv_lvl_iinput is tied to MachineMode
- 
ld_st_priv_lvl_iinput is tied to MAchineMode
 
- 
- As MMUPresent = 0,
- 
- 
flush_tlb_iinput is tied to 0
- 
obi_mmu_ptw_req_ooutput is tied to 0
- 
obi_mmu_ptw_rsp_iinput is tied to 0
 
- 
- As PerfCounterEn = 0,
- 
- 
itlb_miss_ooutput is tied to 0
- 
dtlb_miss_ooutput is tied to 0
 
- 
- As PipelineOnly = True,
- 
- 
load_req_ooutput is tied to 0
- 
load_rsp_iinput is tied to 0
- 
dcache_wbuffer_empty_iinput is tied to 0
- 
dcache_wbuffer_not_ni_iinput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_lsu_ctrl_ooutput is tied to 0
- 
rvfi_mem_paddr_ooutput is tied to 0
 
- 
4.4.3.5.1. store_unit
The store_unit module manages the data store operations.
As stores can be speculative, the store instructions need to be committed by ISSUE_STAGE module before possibily altering the processor state. Store buffer keeps track of store requests. Outstanding store instructions (which are speculative) are differentiated from committed stores. When ISSUE_STAGE module commits a store instruction, outstanding stores become committed.
When commit buffer is not empty, the buffer automatically tries to write the oldest store to the data cache.
Furthermore, the store_unit module provides information to the load_unit to know if an outstanding store matches addresses with a load.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush | CONTROLLER | logic | 
| 
 | out | No store pending | COMMIT_STAGE | logic | 
| 
 | out | Store buffer is empty | LOAD_UNIT | logic | 
| 
 | in | Store instruction is valid | ISSUE_STAGE | logic | 
| 
 | in | Data input | ISSUE_STAGE | lsu_ctrl_t | 
| 
 | out | Pop store | LSU_BYPASS | logic | 
| 
 | in | Instruction commit | COMMIT_STAGE | logic | 
| 
 | out | Commit queue is ready to accept another commit request | COMMIT_STAGE | logic | 
| 
 | out | Store result is valid | ISSUE_STAGE | logic | 
| 
 | out | Transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Store result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Store exception output | ISSUE_STAGE | exception_t | 
| 
 | out | Address translation request | MMU/PMP | logic | 
| 
 | out | Virtual address | MMU/PMP | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Physical address | MMU/PMP | logic[CVA6Cfg.PLEN-1:0] | 
| 
 | in | Exception raised before store | MMU/PMP | exception_t | 
| 
 | in | Address to be checked | load_unit | logic[11:0] | 
| 
 | out | Address check result | load_unit | logic | 
| 
 | out | Store cache response | DCACHE | obi_store_req_t | 
| 
 | in | Store cache request | DCACHE | obi_store_rsp_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As EnableAccelerator = 0,
- 
- 
stall_st_pending_iinput is tied to 0
 
- 
- As RVA = False,
- 
- 
amo_valid_commit_iinput is tied to 0
- 
obi_amo_req_ooutput is tied to 0
- 
obi_amo_rsp_iinput is tied to 0
 
- 
- As IsRVFI = 0,
- 
- 
rvfi_mem_paddr_ooutput is tied to 0
 
- 
- As RVH = False,
- 
- 
tinst_ooutput is tied to 0
- 
hs_ld_st_inst_ooutput is tied to 0
- 
hlvx_inst_ooutput is tied to 0
 
- 
- For any HW configuration,
- 
- 
dtlb_hit_iinput is tied to 1
 
- 
4.4.3.5.2. load_unit
The load unit module manages the data load operations.
Before issuing a load, the load unit needs to check the store buffer for potential aliasing. It stalls until it can satisfy the current request. This means:
- 
Two loads to the same address are allowed. 
- 
Two stores to the same address are allowed. 
- 
A store after a load to the same address is allowed. 
- 
A load after a store to the same address can only be processed if the store has already been sent to the cache i.e there is no fowarding. 
After the check of the store buffer, a read request is sent to the D$ with the index field of the address (1). The load unit stalls until the D$ acknowledges this request (2). In the next cycle, the tag field of the address is sent to the D$ (3). If the load request address is non-idempotent, it stalls until the write buffer of the D$ is empty of non-idempotent requests and the store buffer is empty. It also stalls until the incoming load instruction is the next instruction to be committed. When the D$ allows the read of the data, the data is sent to the load unit and the load instruction can be committed (4).
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush signal | CONTROLLER | logic | 
| 
 | in | Load request is valid | LSU_BYPASS | logic | 
| 
 | in | Load request input | LSU_BYPASS | lsu_ctrl_t | 
| 
 | out | Pop the load request from the LSU bypass FIFO | LSU_BYPASS | logic | 
| 
 | out | Load unit result is valid | ISSUE_STAGE | logic | 
| 
 | out | Load transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Load result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Load exception | ISSUE_STAGE | exception_t | 
| 
 | out | Request address translation | MMU/PMP | logic | 
| 
 | out | Virtual address | MMU/PMP | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Physical address | MMU/PMP | logic[CVA6Cfg.PLEN-1:0] | 
| 
 | in | Excepted which appears before load | MMU/PMP | exception_t | 
| 
 | out | Page offset for address checking | STORE_UNIT | logic[11:0] | 
| 
 | in | Indicates if the page offset matches a store unit entry | STORE_UNIT | logic | 
| 
 | in | Store buffer is empty | STORE_UNIT | logic | 
| 
 | in | Transaction ID of the committing instruction | COMMIT_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | Load cache response | DCACHE | obi_load_req_t | 
| 
 | in | Load cache request | DCACHE | obi_load_rsp_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVH = False,
- 
- 
tinst_ooutput is tied to 0
- 
hs_ld_st_inst_ooutput is tied to 0
- 
hlvx_inst_ooutput is tied to 0
 
- 
- For any HW configuration,
- 
- 
dtlb_hit_iinput is tied to 1
 
- 
- As MMUPresent = 0,
- 
- 
dtlb_ppn_iinput is tied to 0
 
- 
- As PipelineOnly = True,
- 
- 
load_req_ooutput is tied to 0
- 
load_rsp_iinput is tied to 0
- 
dcache_wbuffer_not_ni_iinput is tied to 0
 
- 
4.4.3.5.3. lsu_bypass
The LSU bypass is a FIFO which keeps instructions from the issue stage when the store unit or the load unit are not available immediately.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Flush | CONTROLLER | logic | 
| 
 | in | Load store unit request | LSU_UNIT | lsu_ctrl_t | 
| 
 | in | load store unit valid | LSU_UNIT | logic | 
| 
 | in | Pop load | LOAD_UNIT | logic | 
| 
 | in | Pop store | STORE_UNIT | logic | 
| 
 | out | Load Store Unit is ready | ISSUE_STAGE | logic | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As IsRVFI = 0,
- 
- 
lsu_ctrl_ooutput is tied to 0
 
- 
4.4.3.6. CVXIF_fu
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | CVXIF instruction is valid | ISSUE_STAGE | logic | 
| 
 | in | Transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | Instruction is illegal, determined during CVXIF issue transaction | ISSUE_STAGE | logic | 
| 
 | in | Offloaded instruction | ISSUE_STAGE | logic[31:0] | 
| 
 | out | CVXIF is ready | ISSUE_STAGE | logic | 
| 
 | out | CVXIF result transaction ID | ISSUE_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | out | CVXIF exception | ISSUE_STAGE | exception_t | 
| 
 | out | CVXIF FU result | ISSUE_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | CVXIF result valid | ISSUE_STAGE | logic | 
| 
 | out | CVXIF write enable | ISSUE_STAGE | logic | 
| 
 | out | CVXIF destination register | ISSUE_STAGE | logic[4:0] | 
| 
 | in | none | none | logic | 
| 
 | in | none | none | x_result_t | 
| 
 | out | none | none | logic | 
4.5. COMMIT_STAGE Module
4.5.1. Description
The COMMIT_STAGE module implements the commit stage, which is the last stage in the processor’s pipeline. For the instructions for which the execution is completed, it updates the architectural state: writing CSR registers, committing stores and writing back data to the register file. The commit stage controls the stalling and the flushing of the processor.
The commit stage also manages the exceptions. An exception can occur during the first four pipeline stages (PCgen cannot generate an exception) or happen in commit stage, coming from the CSR_REGFILE or from an interrupt. Exceptions are precise: they are considered during the commit only and associated with the related instruction.
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Request to halt the core | CONTROLLER | logic | 
| 
 | in | request to flush dcache, also flush the pipeline | CACHE | logic | 
| 
 | out | Exception raised by all sources | EX_STAGE | exception_t | 
| 
 | in | The instruction we want to commit | ISSUE_STAGE | scoreboard_entry_t[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | in | The instruction is cancelled | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Acknowledge that we are indeed committing | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Acknowledge that we are indeed committing | CSR_REGFILE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Register file write address | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][4:0] | 
| 
 | out | Register file write data | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.XLEN-1:0] | 
| 
 | out | Register file write enable | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Floating point register enable | ISSUE_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | out | Program counter | FRONTEND_CSR_REGFILE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Decoded CSR operation | CSR_REGFILE | fu_op | 
| 
 | out | Data to write to CSR | CSR_REGFILE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | Data to read from CSR | CSR_REGFILE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | Exception or interrupt occurred in CSR stage (the same as commit) | CSR_REGFILE | exception_t | 
| 
 | out | Commit the pending store | EX_STAGE | logic | 
| 
 | in | Commit buffer of LSU is ready | EX_STAGE | logic | 
| 
 | out | Transaction id of first commit port | ID_STAGE | logic[CVA6Cfg.TRANS_ID_BITS-1:0] | 
| 
 | in | no store is pending | EX_STAGE | logic | 
| 
 | out | Commit the pending CSR instruction | EX_STAGE | logic | 
| 
 | out | Request a pipeline flush | CONTROLLER | logic | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVF = 0,
- 
- 
dirty_fp_state_ooutput is tied to 0
- 
csr_write_fflags_ooutput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
single_step_iinput is tied to 0
 
- 
- As RVA = False,
- 
- 
obi_amo_rsp_iinput is tied to 0
- 
amo_valid_commit_ooutput is tied to 0
 
- 
- As FenceEn = 0,
- 
- 
fence_i_ooutput is tied to 0
- 
fence_ooutput is tied to 0
 
- 
- As RVS = False,
- 
- 
sfence_vma_ooutput is tied to 0
 
- 
- As RVH = False,
- 
- 
hfence_vvma_ooutput is tied to 0
- 
hfence_gvma_ooutput is tied to 0
 
- 
4.5.2. Functionality
4.6. CONTROLLER Module
4.6.1. Description
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | out | Set PC om PC Gen | FRONTEND | logic | 
| 
 | out | Flush the IF stage | FRONTEND | logic | 
| 
 | out | Flush un-issued instructions of the scoreboard | FRONTEND | logic | 
| 
 | out | Flush ID stage | ID_STAGE | logic | 
| 
 | out | Flush EX stage | EX_STAGE | logic | 
| 
 | out | Flush branch predictors | FRONTEND | logic | 
| 
 | out | Flush ICache | CACHE | logic | 
| 
 | out | Flush DCache | CACHE | logic | 
| 
 | in | Acknowledge the whole DCache Flush | CACHE | logic | 
| 
 | in | Halt request from CSR (WFI instruction) | CSR_REGFILE | logic | 
| 
 | out | Halt signal to commit stage | COMMIT_STAGE | logic | 
| 
 | in | Return from exception | CSR_REGFILE | logic | 
| 
 | in | We got an exception, flush the pipeline | FRONTEND | logic | 
| 
 | in | We got a resolved branch, check if we need to flush the front-end | EX_STAGE | bp_resolve_t | 
| 
 | in | We got an instruction which altered the CSR, flush the pipeline | CSR_REGFILE | logic | 
| 
 | in | Flush request from commit stage | COMMIT_STAGE | logic | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVH = False,
- 
- 
v_iinput is tied to 0
- 
flush_tlb_vvma_ooutput is tied to 0
- 
flush_tlb_gvma_ooutput is tied to 0
- 
hfence_vvma_iinput is tied to 0
- 
hfence_gvma_iinput is tied to 0
 
- 
- As MMUPresent = 0,
- 
- 
flush_tlb_ooutput is tied to 0
 
- 
- As EnableAccelerator = 0,
- 
- 
halt_acc_iinput is tied to 0
- 
flush_acc_iinput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
set_debug_pc_iinput is tied to 0
 
- 
- As FenceEn = 0,
- 
- 
fence_i_iinput is tied to 0
- 
fence_iinput is tied to 0
 
- 
- As RVS = False,
- 
- 
sfence_vma_iinput is tied to 0
 
- 
4.6.2. Functionality
4.7. CSR_REGFILE Module
4.7.1. Description
| Signal | IO | Description | connexion | Type | 
|---|---|---|---|---|
| 
 | in | Subsystem Clock | SUBSYSTEM | logic | 
| 
 | in | Asynchronous reset active low | SUBSYSTEM | logic | 
| 
 | in | Timer threw a interrupt | SUBSYSTEM | logic | 
| 
 | out | send a flush request out when a CSR with a side effect changes | CONTROLLER | logic | 
| 
 | out | halt requested | CONTROLLER | logic | 
| 
 | in | Instruction to be committed | ID_STAGE | scoreboard_entry_t | 
| 
 | in | Commit acknowledged a instruction → increase instret CSR | COMMIT_STAGE | logic[CVA6Cfg.NrCommitPorts-1:0] | 
| 
 | in | Address from which to start booting, mtvec is set to the same address | SUBSYSTEM | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | in | Hart id in a multicore environment (reflected in a CSR) | SUBSYSTEM | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | We’ve got an exception from the commit stage, take it | COMMIT_STAGE | exception_t | 
| 
 | in | Operation to perform on the CSR file | COMMIT_STAGE | fu_op | 
| 
 | in | Address of the register to read/write | EX_STAGE | logic[11:0] | 
| 
 | in | Write data in | COMMIT_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | out | Read data out | COMMIT_STAGE | logic[CVA6Cfg.XLEN-1:0] | 
| 
 | in | PC of instruction accessing the CSR | COMMIT_STAGE | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | attempts to access a CSR without appropriate privilege | COMMIT_STAGE | exception_t | 
| 
 | out | Output the exception PC to PC Gen, the correct CSR (mepc, sepc) is set accordingly | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | Return from exception, set the PC of epc_o | FRONTEND | logic | 
| 
 | out | Output base of exception vector, correct CSR is output (mtvec, stvec) | FRONTEND | logic[CVA6Cfg.VLEN-1:0] | 
| 
 | out | interrupt management to id stage | ID_STAGE | irq_ctrl_t | 
| 
 | in | external interrupt in | SUBSYSTEM | logic[1:0] | 
| 
 | in | inter processor interrupt → connected to machine mode sw | SUBSYSTEM | logic | 
| 
 | out | L1 ICache Enable | CACHE | logic | 
| 
 | out | L1 DCache Enable | CACHE | logic | 
| 
 | out | none | none | rvfi_probes_csr_t | 
| 
 | out | none | none | jvt_t | 
Due to CV32A60X configuration, some ports are tied to a static value. These ports do not appear in the above table, they are listed below
- As RVF = 0,
- 
- 
dirty_fp_state_iinput is tied to 0
- 
csr_write_fflags_iinput is tied to 0
- 
fs_ooutput is tied to 0
- 
fflags_ooutput is tied to 0
- 
frm_ooutput is tied to 0
- 
fprec_ooutput is tied to 0
 
- 
- As EnableAccelerator = 0,
- 
- 
dirty_v_state_iinput is tied to 0
- 
acc_fflags_ex_iinput is tied to 0
- 
acc_fflags_ex_valid_iinput is tied to 0
- 
acc_cons_en_ooutput is tied to 0
- 
pmpcfg_ooutput is tied to 0
- 
pmpaddr_ooutput is tied to 0
 
- 
- As PRIV = MachineOnly,
- 
- 
priv_lvl_ooutput is tied to MachineMode
- 
ld_st_priv_lvl_ooutput is tied to MAchineMode
- 
tvm_ooutput is tied to 0
- 
tw_ooutput is tied to 0
- 
tsr_ooutput is tied to 0
 
- 
- As RVH = False,
- 
- 
v_ooutput is tied to 0
- 
vfs_ooutput is tied to 0
- 
en_g_translation_ooutput is tied to 0
- 
en_ld_st_g_translation_ooutput is tied to 0
- 
ld_st_v_ooutput is tied to 0
- 
csr_hs_ld_st_inst_iinput is tied to 0
- 
vs_sum_ooutput is tied to 0
- 
vmxr_ooutput is tied to 0
- 
vsatp_ppn_ooutput is tied to 0
- 
vs_asid_ooutput is tied to 0
- 
hgatp_ppn_ooutput is tied to 0
- 
vmid_ooutput is tied to 0
- 
vtw_ooutput is tied to 0
- 
hu_ooutput is tied to 0
 
- 
- As RVV = False,
- 
- 
vs_ooutput is tied to 0
 
- 
- As RVS = False,
- 
- 
en_translation_ooutput is tied to 0
- 
en_ld_st_translation_ooutput is tied to 0
- 
sum_ooutput is tied to 0
- 
mxr_ooutput is tied to 0
- 
satp_ppn_ooutput is tied to 0
- 
asid_ooutput is tied to 0
 
- 
- As DebugEn = False,
- 
- 
debug_req_iinput is tied to 0
- 
set_debug_pc_ooutput is tied to 0
- 
debug_mode_ooutput is tied to 0
- 
single_step_ooutput is tied to 0
 
- 
- As PerfCounterEn = 0,
- 
- 
perf_addr_ooutput is tied to 0
- 
perf_data_ooutput is tied to 0
- 
perf_data_iinput is tied to 0
- 
perf_we_ooutput is tied to 0
- 
mcountinhibit_ooutput is tied to 0
 
- 
4.7.2. Functionality
5. Glossary
- 
ALU: Arithmetic/Logic Unit 
- 
APU: Application Processing Unit 
- 
ASIC: Application-Specific Integrated Circuit 
- 
AXI: Advanced eXtensible Interface 
- 
BHT: Branch History Table 
- 
BTB: Branch Target Buffer 
- 
Byte: 8-bit data item 
- 
CPU: Central Processing Unit, processor 
- 
CSR: Control and Status Register 
- 
Custom extension: Non-Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA) 
- 
CVA6: Core-V Application class processor with a 6 stage pipeline 
- 
D$: Data Cache 
- 
DPI: Direct Programming Interface 
- 
EX or EXE: Instruction Execute 
- 
FPGA: Field Programmable Gate Array 
- 
FPU: Floating Point Unit 
- 
Halfword: 16-bit data item 
- 
Halfword aligned address: An address is halfword aligned if it is divisible by 2 
- 
I$: Instruction Cache 
- 
ID: Instruction Decode 
- 
IF: Instruction Fetch 
- 
ISA: Instruction Set Architecture 
- 
KGE: Kilo Gate Equivalents (NAND2) 
- 
LSU: Load Store Unit 
- 
M-Mode: Machine Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture) 
- 
MMU: Memory Management Unit 
- 
NC: Not Cacheable 
- 
OBI: Open Bus Interface 
- 
OoO: Out Of Order 
- 
PC: Program Counter 
- 
PMP: Physical memory protection (RISC-V Instruction Set Manual, Volume II: Privileged Architecture) 
- 
PTW: Page Table Walker 
- 
PULP platform: Parallel Ultra Low Power Platform (https://pulp-platform.org) 
- 
RAS: Return Address Stack 
- 
RV32C: RISC-V Compressed (C extension) 
- 
RV32F: RISC-V Floating Point (F extension) 
- 
S-Mode: Supervisor Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture) 
- 
SIMD: Single Instruction/Multiple Data 
- 
Standard extension: Standard extension to the RISC-V base instruction set (RISC-V Instruction Set Manual, Volume I: User-Level ISA) 
- 
TLB: Translation Lookaside Buffer 
- 
U-Mode: User Mode (RISC-V Instruction Set Manual, Volume II: Privileged Architecture) 
- 
VLEN: Virtual address length 
- 
WARL: Write Any Values, Reads Legal Values 
- 
WB: Write Back of instruction results 
- 
WLRL: Write/Read Only Legal Values 
- 
Word: 32-bit data item 
- 
Word aligned address: An address is word aligned if it is divisible by 4 
- 
WPRI: Reserved Writes Preserve Values, Reads Ignore Values 
- 
XLEN: RISC-V processor data length