Introduction

CV32E40P is a 4-stage in-order 32-bit RISC-V processor core. The ISA of CV32E40P has been extended to support multiple additional instructions including hardware loops, post-increment load and store instructions, additional ALU instructions and SIMD instructions that are not part of the standard RISC-V ISA. Figure 1 shows a block diagram of the top level with the core and the FPU.

Figure 1 Block Diagram of CV32E40P RISC-V Core

License

Copyright 2023 OpenHW Group.

Copyright 2018 ETH Zurich and University of Bologna.

Copyright and related rights are licensed under the Solderpad Hardware License, Version 0.51 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law or agreed to in writing, software, hardware and materials distributed under this License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Bus Interfaces

The Instruction Fetch and Load/Store data bus interfaces are compliant to the OBI (Open Bus Interface) protocol. See OBI-v1.2.pdf for details about the protocol. Additional information can be found in the Instruction Fetch and Load-Store-Unit (LSU) chapters of this document.

Standards Compliance

CV32E40P is a standards-compliant 32-bit RISC-V processor. It follows these specifications:

Many features in the RISC-V specification are optional, and CV32E40P can be parameterized to enable or disable some of them.

CV32E40P supports the following base integer instruction set.

  • The RV32I Base Integer Instruction Set, version 2.1

In addition, the following standard instruction set extensions are available.

Table 1 CV32E40P Standard Instruction Set Extensions

Standard Extension

Version

Configurability

C: Standard Extension for Compressed Instructions

2.0

always enabled

M: Standard Extension for Integer Multiplication and Division

2.0

always enabled

Zicntr: Performance Counters

2.0

always enabled

Zicsr: Control and Status Register Instructions

2.0

always enabled

Zifencei: Instruction-Fetch Fence

2.0

always enabled

F: Single-Precision Floating-Point using F registers

2.2

optionally enabled with the FPU parameter

Zfinx: Single-Precision Floating-Point using X registers

1.0

optionally enabled with the ZFINX parameter (also requires the FPU parameter)

The following custom instruction set extensions are available.

Table 2 CV32E40P Custom Instruction Set Extensions

Custom Extension

Version

Configurability

Xcv: CORE-V PULP ISA Extensions

1.0

optionally enabled with the COREV_PULP parameter

Xcvelw: CORE-V PULP Cluster ISA Extension

1.0

optionally enabled with the COREV_CLUSTER parameter

Most content of the RISC-V privileged specification is optional. CV32E40P currently supports the following features according to the RISC-V Privileged Specification, version 1.11.

Synthesis guidelines

The CV32E40P core is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.

The top level module is called cv32e40p_top and includes both the core and the FPU. All the core files are in rtl and rtl/include folders (all synthesizable) while all the FPU files are in rtl/vendor/pulp_platform_common_cells, rtl/vendor/pulp_platform_fpnew and rtl/vendor/pulp_platform_fpu_div_sqrt. .. while all the FPU files are in rtl/vendor/pulp_platform_common_cells, rtl/vendor/pulp_platform_fpnew and rtl/vendor/opene906. cv32e40p_fpu_manifest.flist is listing all the required files.

The user must provide a clock-gating module that instantiates the functionally equivalent clock-gating cell of the target technology. This file must have the same interface and module name as the one provided for simulation-only purposes at bhv/cv32e40p_sim_clock_gate.sv (see Clock Gating Cell).

The constraints/cv32e40p_core.sdc file provides an example of synthesis constraints.

ASIC Synthesis

ASIC synthesis is supported for CV32E40P. The whole design is completely synchronous and uses positive-edge triggered flip-flops. The core occupies an area of about XX kGE. With the FPU, the area increases to about XX kGE (XX kGE FPU, XX kGE additional register file). A technology specific implementation of a clock gating cell as described in Clock Gating Cell needs to be provided.

FPGA Synthesis

FPGA synthesis is only supported for CV32E40P. The user needs to provide a technology specific implementation of a clock gating cell as described in Clock Gating Cell.

Synthesizing with the FPU

By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except FP multicycle DIV/SQRT ones). But as FPU operations are much more complex than ALU ones, maximum achievable frequency is much lower than ALU one when FPU is enabled. If this can be fine for low frequency systems, it is possible to indicate how many pipeline registers are instantiated in the FPU to reach higher target frequency. This is done with FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency. It should be noted that any additional pipeline register is impacting FPU instructions latency and could cause performances degradation depending of applications using Floating-Point operations. Those pipeline registers are all added at the end of the FPU pipeline with all operators before them. Optimal frequency is only achievable using automatic retiming commands in implementation tools. This can be achieved with the following command for Synopsys Design Compiler: “set_optimize_registers true -designs [get_object_name [get_designs “*cv32e40p_fp_wrapper*”]]”.

Contents

History

CV32E40P started its life as a fork of the OR10N CPU core based on the OpenRISC ISA. Then, under the name of RI5CY, it became a RISC-V core (2016), and it has been maintained by the PULP platform <https://pulp-platform.org> team until February 2020, when it has been contributed to OpenHW Group https://www.openhwgroup.org.

As RI5CY has been used in several projects, a list of all the changes made by OpenHW Group since February 2020 follows:

Memory-Protocol

The Instruction and Data memory interfaces are now compliant with the OBI protocol (see OBI-v1.2.pdf). Such memory interface is slightly different from the one used by RI5CY as: the grant signal can now be kept high by the bus even without the core raising a request; and the request signal does not depend anymore on the rvalid signal (no combinatorial dependency). The OBI is easier to be interfaced to the AMBA AXI and AHB protocols and improves timing as it removes rvalid->req dependency. Also, the protocol forces the address stability. Thus, the core can not retract memory requests once issued, nor can it change the issued address (as was the case for the RI5CY instruction memory interface).

RV32F Extensions

Previously, RI5CY could select with a parameter whether the FPU was instantiated inside the EX stage or via the APU interface. Now in CV32E40P, the FPU is not instantiated in the core EX stage anymore. A new file called cv32e40p_top.sv is instantiating the core together with the FPU and APU interface is not visible on I/Os. This is this new top level which has been used for Verification and Implementation.

RV32A Extensions, Security and Memory Protection

CV32E40P core does not support the RV32A (atomic) extensions, the U-mode, and the PMP anymore. Most of the previous RTL descriptions of these features have been kept but not maintained. The RTL code has been partially kept to allow previous users of these features to develop their own by reusing previously developed RI5CY modules.

CSR Address Re-Mapping

RI5CY used to have custom performance counters 32b wide (not compliant with RISC-V) in the CSR address space {0x7A0, 0x7A1, 0x780-0x79F}. CV32E40P is now fully compliant with the RISC-V spec on performance counters side. And the custom PULP HWLoop CSRs have been moved from the 0x7C* to RISC-V user custom read-only 0xCC0-0xCFF address space.

Interrupts

RI5CY used to have a req plus a 5 bits ID interrupt interface, supporting up to 32 interrupt requests (only one active at a time), with the priority defined outside in an interrupt controller. CV32E40P is now compliant with the CLINT RISC-V spec, extended with 16 custom interrupts lines called fast, for a total of 19 interrupt lines. They can be all active simultaneously, and priority and per-request interrupt enable bit is controlled by the core CLINT definition.

PULP HWLoop Spec

RI5CY supported two nested HWLoops. Every loop had a minimum of two instructions. The start and end of the loop addresses could be misaligned, and the instructions in the loop body could be of any kind. CV32E40P has a more restricted spec for the HWLoop (see CORE-V Hardware Loop feature).

Compliancy, bug fixing, code clean-up, and documentation

The CV32E40P has been verified. It is fully compliant with RISC-V (RI5CY was partially compliant). Many bugs have been fixed, and the RTL code cleaned-up. The documentation has been formatted with reStructuredText and has been developed following at industrial quality level.

References

  1. Gautschi, Michael, et al. “Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices.” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017

  2. Schiavone, Pasquale Davide, et al. “Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications.” 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2017)

Contributors

Andreas Traber (atraber@iis.ee.ethz.ch)
Michael Gautschi (gautschi@iis.ee.ethz.ch)
Pasquale Davide Schiavone (pschiavo@iis.ee.ethz.ch)
Paul Zavalney (paul.zavalney@silabs.com)
Pascal Gouédo (pascal.gouedo@dolphin.fr)
Micrel Lab and Multitherman Lab
University of Bologna, Italy
Integrated Systems Lab
ETH Zürich, Switzerland