Core Integration
The main module is named cv32e40p_top
and can be found in cv32e40p_top.sv
.
Below, the instantiation template is given and the parameters and interfaces are described.
Note
cv32e40p_top
instantiates former cv32e40p_core
and a wrapped fpnew_top
.
It is highly suggested to use cv32e40p_top
in place of cv32e40p_core
as
it allows to easily enable/disable FPU parameter with no interface change.
As mentioned in Non-backward compatibility, v2.0.0 cv32e40p_core
has slight
modifications that makes it not backward compatible with v1.0.0 one in some cases.
It is worth mentioning that if the core in its v1 version was/is instantiated without parameters setting,
there is still backward compatibility as all parameters default value are set to v1 values.
Instantiation Template
cv32e40p_top #(
.FPU ( 0 ),
.FPU_ADDMUL_LAT ( 0 ),
.FPU_OTHERS_LAT ( 0 ),
.ZFINX ( 0 ),
.COREV_PULP ( 0 ),
.COREV_CLUSTER ( 0 ),
.NUM_MHPMCOUNTERS ( 1 )
) u_core (
// Clock and reset
.rst_ni (),
.clk_i (),
.scan_cg_en_i (),
// Special control signals
.fetch_enable_i (),
.pulp_clock_en_i (),
.core_sleep_o (),
// Configuration
.boot_addr_i (),
.mtvec_addr_i (),
.dm_halt_addr_i (),
.dm_exception_addr_i (),
.hart_id_i (),
// Instruction memory interface
.instr_addr_o (),
.instr_req_o (),
.instr_gnt_i (),
.instr_rvalid_i (),
.instr_rdata_i (),
// Data memory interface
.data_addr_o (),
.data_req_o (),
.data_gnt_i (),
.data_we_o (),
.data_be_o (),
.data_wdata_o (),
.data_rvalid_i (),
.data_rdata_i (),
// Interrupt interface
.irq_i (),
.irq_ack_o (),
.irq_id_o (),
// Debug interface
.debug_req_i (),
.debug_havereset_o (),
.debug_running_o (),
.debug_halted_o ()
);
Parameters
Name |
Type/Range |
Default |
Description |
---|---|---|---|
|
bit |
0 |
Enable Floating Point Unit (FPU) support, see Floating Point Unit (FPU) |
|
int |
0 |
Number of pipeline registers for Floating-Point addition and multiplication instructions, see Floating Point Unit (FPU) |
|
int |
0 |
Number of pipeline registers for Floating-Point comparison, conversion and classify instructions, see Floating Point Unit (FPU) |
|
bit |
0 |
Enable Floating Point instructions to use the General Purpose
register file instead of requiring a dedicated Floating Point
register file, see Floating Point Unit (FPU). Only allowed to be set to 1
if |
|
bit |
0 |
Enable all of the custom PULP ISA extensions (except cv.elw) (see CORE-V Instruction Set Custom Extensions) and all custom CSRs (see Control and Status Registers). Examples of PULP ISA extensions are post-incrementing load and stores (see Post-Increment Load & Store Instructions and Register-Register Load & Store Instructions) and hardware loops (see Hardware Loops). |
|
bit |
0 |
Enable PULP Cluster support (cv.elw), see PULP Cluster Extension |
|
int (0..29) |
1 |
Number of MHPMCOUNTER performance counters, see Performance Counters |
Interfaces
Signal |
Width |
Dir |
Description |
---|---|---|---|
|
1 |
in |
Active-low asynchronous reset |
|
1 |
in |
Clock signal |
|
1 |
in |
Scan clock gate enable. Design for test (DfT) related signal. Can be used during scan testing operation to force instantiated clock gate(s) to be enabled. This signal should be 0 during normal / functional operation. |
|
1 |
in |
Enable the instruction fetch of CV32E40P.
The first instruction fetch after reset
de-assertion will not happen as long as
this signal is 0. |
|
1 |
out |
Core is sleeping, see Sleep Unit. |
|
1 |
in |
PULP clock enable (only used when
|
|
32 |
in |
Boot address. First program counter after
reset = |
|
32 |
in |
|
|
32 |
in |
Address to jump to when entering Debug
Mode, see Debug & Trigger. Must be
word-aligned. Do not change after enabling
core via |
|
32 |
in |
Address to jump to when an exception
occurs when executing code during Debug
Mode, see Debug & Trigger. Must be
word-aligned. Do not change after enabling
core via |
|
32 |
in |
Hart ID, usually static, can be read from Hardware Thread ID (mhartid) and User Hardware Thread ID (uhartid) CSRs |
|
Instruction fetch interface, see Instruction Fetch |
||
|
Load-store unit interface, see Load-Store-Unit (LSU) |
||
|
Interrupt inputs, see Exceptions and Interrupts |
||
|
Debug interface, see Debug & Trigger |
Clock Gating Cell
CV32E40P requires clock gating cells.
These cells are usually specific to the selected target technology and thus not provided as part of the RTL design.
A simulation-only version of the clock gating cell is provided in cv32e40p_sim_clock_gate.sv
. This file contains
a module called cv32e40p_clock_gate
that has the following ports:
clk_i
: Clock Inputen_i
: Clock Enable Inputscan_cg_en_i
: Scan Clock Gate Enable Input (activates the clock even thoughen_i
is not set)clk_o
: Gated Clock Output
Inside CV32E40P, clock gating cells are used in both cv32e40p_sleep_unit.sv
and cv32e40p_top.sv
.
The cv32e40p_sim_clock_gate.sv
file is not intended for synthesis. For ASIC synthesis and FPGA synthesis the manifest
should be adapted to use a customer specific file that implements the cv32e40p_clock_gate
module using design primitives
that are appropriate for the intended synthesis target technology.
Synthesis guidelines
The CV32E40P core is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.
The top level module is called cv32e40p_top and includes both the core and the FPU.
All the core files are in rtl
and rtl/include
folders (all synthesizable)
while all the FPU files are in rtl/vendor/pulp_platform_common_cells
, rtl/vendor/pulp_platform_fpnew
and rtl/vendor/pulp_platform_fpu_div_sqrt
.
.. while all the FPU files are in rtl/vendor/pulp_platform_common_cells
, rtl/vendor/pulp_platform_fpnew
and rtl/vendor/opene906
.
cv32e40p_fpu_manifest.flist is listing all the required files.
The user must provide a clock-gating module that instantiates the functionally equivalent clock-gating cell of the target technology.
This file must have the same interface and module name as the one provided for simulation-only purposes at bhv/cv32e40p_sim_clock_gate.sv
(see Clock Gating Cell).
The constraints/cv32e40p_core.sdc
file provides an example of synthesis constraints.
ASIC Synthesis
ASIC synthesis is supported for CV32E40P. The whole design is completely synchronous and uses positive-edge triggered flip-flops.
To give some size numbers, it has been synthetized at 100 MHz with a 32 KB memory connected on each of its OBI interface, DFT scan chains have been implemented and it went down to full back-end implementation with Clock Tree synthesis. But no memory bist are inserted and there are no scan compression for DFT.
And a technology specific implementation of a clock gating cell as described in Clock Gating Cell has been provided.
Following table gives CV32E40P size in Kilo-Gates numbers using a 2-input NAND gate with X1 drive for different top parameters settings (COREV_CLUSTER = 0 for all cases).
Configuration |
Top Parameters |
KG |
---|---|---|
V1 |
COREV_PULP = 0 FPU = 0 ZFINX = 0 |
40 |
V2 PULP |
COREV_PULP = 1 FPU = 0 ZFINX = 0 |
57 |
V2 PULP & FPU |
COREV_PULP = 1 FPU = 1 ZFINX = 0 FPU_ADDMUL_LAT = 0 FPU_OTHERS_LAT = 0 |
93 |
V2 PULP & FPU & ZFINX |
COREV_PULP = 1 FPU = 1 ZFINX = 1 FPU_ADDMUL_LAT = 0 FPU_OTHERS_LAT = 0 |
77 |
FPGA Synthesis
FPGA synthesis is supported for CV32E40P and it has been successfully implemented using both AMD® Vivado® and Intel® Quartus® Prime Pro Edition tools.
Due to some advanced System Verilog features used by CV32E40P RTL design, Intel® Quartus® Prime Standard Edition isn’t able to parse some CV32E40P System Verilog files.
The user needs to provide a technology specific implementation of a clock gating cell as described in Clock Gating Cell.
Synthesizing with the FPU
By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except multicycle FDIV/FSQRT ones). But as FPU operations are much more complex than ALU ones, maximum achievable frequency is much lower than ALU one when FPU is enabled.
If this can be fine for low frequency systems, it is possible to indicate how many pipeline registers are instantiated in the FPU to reach higher target frequency. This is done by adjusting FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency.
It should be noted that any additional pipeline register is impacting FPU instructions latency and could cause performances degradation depending of applications using Floating-Point operations.
Those pipeline registers are all added at the end of the FPU pipeline with all operators before them. Optimal frequency is only achievable using automatic retiming commands in implementation tools. As an exemple, this can be done for Synopsys® Design Compiler with the following command:
“set_optimize_registers true -designs [get_object_name [get_designs “*cv32e40p_fp_wrapper*”]]”.