Core Integration

The main module is named cv32e40p_top and can be found in cv32e40p_top.sv. Below, the instantiation template is given and the parameters and interfaces are described.

Note

cv32e40p_top instantiates former cv32e40p_core and a wrapped fpnew_top. It is highly suggested to use cv32e40p_top in place of cv32e40p_core as it allows to easily enable/disable FPU parameter with no interface change. As mentioned in Non-backward compatibility, v2.0.0 cv32e40p_core has slight modifications that makes it not backward compatible with v1.0.0 one in some cases. It is worth mentioning that if the core in its v1 version was/is instantiated without parameters setting, there is still backward compatibility as all parameters default value are set to v1 values.

Instantiation Template

cv32e40p_top #(
    .FPU                      ( 0 ),
    .FPU_ADDMUL_LAT           ( 0 ),
    .FPU_OTHERS_LAT           ( 0 ),
    .ZFINX                    ( 0 ),
    .COREV_PULP               ( 0 ),
    .COREV_CLUSTER            ( 0 ),
    .NUM_MHPMCOUNTERS         ( 1 )
) u_core (
    // Clock and reset
    .rst_ni                   (),
    .clk_i                    (),
    .scan_cg_en_i             (),

    // Special control signals
    .fetch_enable_i           (),
    .pulp_clock_en_i          (),
    .core_sleep_o             (),

    // Configuration
    .boot_addr_i              (),
    .mtvec_addr_i             (),
    .dm_halt_addr_i           (),
    .dm_exception_addr_i      (),
    .hart_id_i                (),

    // Instruction memory interface
    .instr_addr_o             (),
    .instr_req_o              (),
    .instr_gnt_i              (),
    .instr_rvalid_i           (),
    .instr_rdata_i            (),

    // Data memory interface
    .data_addr_o              (),
    .data_req_o               (),
    .data_gnt_i               (),
    .data_we_o                (),
    .data_be_o                (),
    .data_wdata_o             (),
    .data_rvalid_i            (),
    .data_rdata_i             (),

     // Interrupt interface
    .irq_i                    (),
    .irq_ack_o                (),
    .irq_id_o                 (),

    // Debug interface
    .debug_req_i              (),
    .debug_havereset_o        (),
    .debug_running_o          (),
    .debug_halted_o           ()
);

Parameters

Table 3 Parameters

Name

Type/Range

Default

Description

FPU

bit

0

Enable Floating Point Unit (FPU) support, see Floating Point Unit (FPU)

FPU_ADDMUL_LAT

int

0

Number of pipeline registers for Floating-Point addition and multiplication instructions, see Floating Point Unit (FPU)

FPU_OTHERS_LAT

int

0

Number of pipeline registers for Floating-Point comparison, conversion and classify instructions, see Floating Point Unit (FPU)

ZFINX

bit

0

Enable Floating Point instructions to use the General Purpose register file instead of requiring a dedicated Floating Point register file, see Floating Point Unit (FPU). Only allowed to be set to 1 if FPU = 1

COREV_PULP

bit

0

Enable all of the custom PULP ISA extensions (except cv.elw) (see CORE-V Instruction Set Custom Extensions) and all custom CSRs (see Control and Status Registers).

Examples of PULP ISA extensions are post-incrementing load and stores (see Post-Increment Load & Store Instructions and Register-Register Load & Store Instructions) and hardware loops (see Hardware Loops).

COREV_CLUSTER

bit

0

Enable PULP Cluster support (cv.elw), see PULP Cluster Extension

NUM_MHPMCOUNTERS

int (0..29)

1

Number of MHPMCOUNTER performance counters, see Performance Counters

Interfaces

Table 4 Interfaces

Signal

Width

Dir

Description

rst_ni

1

in

Active-low asynchronous reset

clk_i

1

in

Clock signal

scan_cg_en_i

1

in

Scan clock gate enable. Design for test (DfT) related signal. Can be used during scan testing operation to force instantiated clock gate(s) to be enabled. This signal should be 0 during normal / functional operation.

fetch_enable_i

1

in

Enable the instruction fetch of CV32E40P. The first instruction fetch after reset de-assertion will not happen as long as this signal is 0. fetch_enable_i needs to be set to 1 for at least one cycle while not in reset to enable fetching. Once fetching has been enabled the value fetch_enable_i is ignored.

core_sleep_o

1

out

Core is sleeping, see Sleep Unit.

pulp_clock_en_i

1

in

PULP clock enable (only used when COREV_CLUSTER = 1, tie to 0 otherwise), see Sleep Unit

boot_addr_i

32

in

Boot address. First program counter after reset = boot_addr_i. Must be half-word aligned. Do not change after enabling core via fetch_enable_i

mtvec_addr_i

32

in

mtvec address. Initial value for the address part of Machine Trap-Vector Base Address (mtvec). Do not change after enabling core via fetch_enable_i

dm_halt_addr_i

32

in

Address to jump to when entering Debug Mode, see Debug & Trigger. Must be word-aligned. Do not change after enabling core via fetch_enable_i

dm_exception_addr_i

32

in

Address to jump to when an exception occurs when executing code during Debug Mode, see Debug & Trigger. Must be word-aligned. Do not change after enabling core via fetch_enable_i

hart_id_i

32

in

Hart ID, usually static, can be read from Hardware Thread ID (mhartid) and User Hardware Thread ID (uhartid) CSRs

instr_*

Instruction fetch interface, see Instruction Fetch

data_*

Load-store unit interface, see Load-Store-Unit (LSU)

irq_*

Interrupt inputs, see Exceptions and Interrupts

debug_*

Debug interface, see Debug & Trigger

Clock Gating Cell

CV32E40P requires clock gating cells. These cells are usually specific to the selected target technology and thus not provided as part of the RTL design. A simulation-only version of the clock gating cell is provided in cv32e40p_sim_clock_gate.sv. This file contains a module called cv32e40p_clock_gate that has the following ports:

  • clk_i: Clock Input

  • en_i: Clock Enable Input

  • scan_cg_en_i: Scan Clock Gate Enable Input (activates the clock even though en_i is not set)

  • clk_o: Gated Clock Output

Inside CV32E40P, clock gating cells are used in both cv32e40p_sleep_unit.sv and cv32e40p_top.sv.

The cv32e40p_sim_clock_gate.sv file is not intended for synthesis. For ASIC synthesis and FPGA synthesis the manifest should be adapted to use a customer specific file that implements the cv32e40p_clock_gate module using design primitives that are appropriate for the intended synthesis target technology.

Synthesis guidelines

The CV32E40P core is fully synthesizable. It has been designed mainly for ASIC designs, but FPGA synthesis is supported as well.

The top level module is called cv32e40p_top and includes both the core and the FPU. All the core files are in rtl and rtl/include folders (all synthesizable) while all the FPU files are in rtl/vendor/pulp_platform_common_cells, rtl/vendor/pulp_platform_fpnew and rtl/vendor/pulp_platform_fpu_div_sqrt. .. while all the FPU files are in rtl/vendor/pulp_platform_common_cells, rtl/vendor/pulp_platform_fpnew and rtl/vendor/opene906. cv32e40p_fpu_manifest.flist is listing all the required files.

The user must provide a clock-gating module that instantiates the functionally equivalent clock-gating cell of the target technology. This file must have the same interface and module name as the one provided for simulation-only purposes at bhv/cv32e40p_sim_clock_gate.sv (see Clock Gating Cell).

The constraints/cv32e40p_core.sdc file provides an example of synthesis constraints.

ASIC Synthesis

ASIC synthesis is supported for CV32E40P. The whole design is completely synchronous and uses positive-edge triggered flip-flops. The core occupies an area of about XX kGE. With the FPU, the area increases to about XX kGE (XX kGE FPU, XX kGE additional register file). A technology specific implementation of a clock gating cell as described in Clock Gating Cell needs to be provided.

FPGA Synthesis

FPGA synthesis is supported for CV32E40P and it has been successfully implemented using both AMD® Vivado® and Intel® Quartus® Prime Pro Edition tools.

Due to some advanced System Verilog features used by CV32E40P RTL design, Intel® Quartus® Prime Standard Edition isn’t able to parse some CV32E40P System Verilog files.

The user needs to provide a technology specific implementation of a clock gating cell as described in Clock Gating Cell.

Synthesizing with the FPU

By default the pipeline of the FPU is purely combinatorial (FPU_*_LAT = 0). In this case FPU instructions latency is the same than simple ALU operations (except multicycle FDIV/FSQRT ones). But as FPU operations are much more complex than ALU ones, maximum achievable frequency is much lower than ALU one when FPU is enabled.

If this can be fine for low frequency systems, it is possible to indicate how many pipeline registers are instantiated in the FPU to reach higher target frequency. This is done by adjusting FPU_*_LAT CV32E40P parameters setting to perfectly fit target frequency.

It should be noted that any additional pipeline register is impacting FPU instructions latency and could cause performances degradation depending of applications using Floating-Point operations.

Those pipeline registers are all added at the end of the FPU pipeline with all operators before them. Optimal frequency is only achievable using automatic retiming commands in implementation tools. As an exemple, this can be done for Synopsys® Design Compiler with the following command:

“set_optimize_registers true -designs [get_object_name [get_designs “*cv32e40p_fp_wrapper*”]]”.