Floating Point Unit (FPU)

The RV32F ISA extension for floating-point support in the form of IEEE-754 single precision can be enabled by setting the parameter FPU of the cv32e40p_top top level module to 1. This will extend the CV32E40P decoder accordingly and will instantiate the FPU. The FPU repository used by the CV32E40P is available at https://github.com/openhwgroup/cvfpu and its documentation can be found here. CVFPU v0.8.1 release has been copied in CV32E40P repository inside rtl/vendor (used for verification and implementation) so all core and FPU RTL files should be taken from CV32E40P repository.

cv32e40p_fpu_manifest file is listing all necessary files for both the Core and CVFPU.

CVFPU parameters

As CVFPU is an highly configurable IP, here is the list of its parameters and their actual value used when CVFPU is intantiated through a wrapper in cv32e40p_top module.

Table 5 CVFPU Features parameter

Name

Type/Range

Value

Description

Width

int

32

Datapath Width

Specifies the width of the input and output data ports and of the datapath.

EnableVectors

logic

0

Vectorial Hardware Generation

Controls the generation of packed-SIMD computation units.

EnableNanBox

logic

0

NaN-Boxing Check Control

Controls whether input value NaN-boxing is enforced.

FpFmtMask

fmt_logic_t

{1, 0, 0, 0, 0}

Enabled Floating-Point Formats

Enables respectively:

IEEE Single-Precision format

IEEE Double-Precision format

IEEE Half-Precision format

Custom Byte-Precision format

Custom Alternate Half-Precision format

IntFmtMask

ifmt_logic_t

{0, 0, 1, 0}

Enabled Integer Formats

Enables respectively:

Byte format

Half-Word format

Word format

Double-Word format

Table 6 CVFPU Implementation parameter

Name

Type/Range

Value

Description

PipeRegs

opgrp_fmt_unsigned_t

{

{FPU_ADDMUL_LAT, 0, 0, 0, 0},

{default: 1},

{default: FPU_OTHERS_LAT},

{default: FPU_OTHERS_LAT}

}

Number of Pipelining Stages

This parameter sets a number of pipeline stages to be inserted into the computational units per operation group, per FP format. As such, latencies for different operations and different formats can be freely configured.

Respectively:

ADDition/MULtiplication operation group

DIVision/SQuare RooT operation group

NON COMPuting operation group

CONVersion operation group

FPU_ADDMUL_LAT and FPU_OTHERS_LAT are cv32e40p_top parameters.

UnitTypes

opgrp_fmt_unit_types_t

{

{default: MERGED},

{default: MERGED},

{default: PARALLEL},

{default: MERGED}

}

HW Unit Implementation

This parameter allows to control resources by either removing operation units for certain formats and operations, or merging multiple formats into one.

Respectively:

ADDition/MULtiplication operation group

DIVision/SQuare RooT operation group

NON COMPuting operation group

CONVersion operation group

PipeConfig

pipe_config_t

AFTER

Pipeline Register Placement

This parameter controls where pipeling registers (number defined by PipeRegs) are placed in each operational unit.

AFTER means they are all placed at the output of each operational unit.

See Synthesizing with the FPU advices to get best synthesis results.

Table 7 Other CVFPU parameters

Name

Type/Range

Value

Description

TagType

logic

The SystemVerilog data type of the operation tag input and output ports.

TrueSIMDClass

int

0

Vectorial mode classify operation RISC-V compliancy.

EnableSIMDMask

int

0

Inactive vectorial lanes floating-point status flags masking.

FP Register File

By default a dedicated register file consisting of 32 floating-point registers, f0-f31, is instantiated. This default behavior can be overruled by setting the parameter ZFINX of the cv32e40p_top top level module to 1, in which case the dedicated register file is not included and the general purpose register file is used instead to host the floating-point operands.

The latency of the individual instructions are explained in Cycle counts per instruction type table.

To allow FPU unit to be put in sleep mode at the same time the core is doing so, a clock gating cell is instantiated in cv32e40p_top top level module as well with its enable signal being inverted core_sleep_o core output.

FP CSR

When using floating-point extensions the standard specifies a floating-point status and control register (Floating-point control and status register (fcsr)) which contains the exceptions that occurred since it was last reset and the rounding mode. Floating-point accrued exceptions (fflags) and Floating-point dynamic rounding mode (frm) can be accessed directly or via Floating-point control and status register (fcsr) which is mapped to those two registers.

Reminder for programmers

As mentioned in RISC-V Privileged Architecture specification, mstatus.FS should be set to Initial to be able to use FP instructions. If mstatus.FS = Off (reset value), any instruction that attempts to read or write the Floating-Point state (F registers or F CSRs) will cause an illegal instruction exception.

Upon interrupt or context switch events, mstatus.SD should be read to see if Floating-Point state has been altered. If following executed program (interrupt routine or whatsover) is going to use FP instructions and only if mstatus.SD = 1 (means FS = Dirty), then the whole FP state (F registers and F CSRs) should be saved in memory and program should set mstatus.FS to Clean. When returning to interrupted or main program, if mstatus.FS = Clean then the whole FP state should be restored from memory.