Floating Point Unit (FPU)

The RV32F ISA extension for floating-point support in the form of IEEE-754 single precision can be enabled by setting the parameter FPU of the cv32e40p_top top level module to 1. This will extend the CV32E40P decoder accordingly and will instantiate the FPU. The FPU repository used by the CV32E40P is available at https://github.com/openhwgroup/cvfpu and its documentation can be found here. CVFPU v1.0.0 release has been copied in CV32E40P repository inside rtl/vendor (used for verification and implementation) so all core and FPU RTL files should be taken from CV32E40P repository.

cv32e40p_fpu_manifest file is listing all necessary files for both the Core and CVFPU.

CVFPU parameters

As CVFPU is an highly configurable IP, here is the list of its parameters and their actual value used when CVFPU is intantiated through a wrapper in cv32e40p_top module.

Table 5 CVFPU Features parameter
Name	Type/Range	Value	Description
`Width`	int	32	Datapath Width Specifies the width of the input and output data ports and of the datapath.
`EnableVectors`	logic	0	Vectorial Hardware Generation Controls the generation of packed-SIMD computation units.
`EnableNanBox`	logic	0	NaN-Boxing Check Control Controls whether input value NaN-boxing is enforced.
`FpFmtMask`	fmt_logic_t	{1, 0, 0, 0, 0}	Enabled Floating-Point Formats Enables respectively: IEEE Single-Precision format IEEE Double-Precision format IEEE Half-Precision format Custom Byte-Precision format Custom Alternate Half-Precision format
`IntFmtMask`	ifmt_logic_t	{0, 0, 1, 0}	Enabled Integer Formats Enables respectively: Byte format Half-Word format Word format Double-Word format

Table 6 CVFPU Implementation parameter
Name	Type/Range	Value	Description
`PipeRegs`	opgrp_fmt_unsigned_t	{ {`FPU_ADDMUL_LAT`, 0, 0, 0, 0}, {default: 1}, {default: `FPU_OTHERS_LAT`}, {default: `FPU_OTHERS_LAT`} }	Number of Pipelining Stages This parameter sets a number of pipeline stages to be inserted into the computational units per operation group, per FP format. As such, latencies for different operations and different formats can be freely configured. Respectively: ADDition/MULtiplication operation group DIVision/SQuare RooT operation group NON COMPuting operation group CONVersion operation group `FPU_ADDMUL_LAT` and `FPU_OTHERS_LAT` are `cv32e40p_top` parameters.
`UnitTypes`	opgrp_fmt_unit_types_t	{ {default: MERGED}, {default: MERGED}, {default: PARALLEL}, {default: MERGED} }	HW Unit Implementation This parameter allows to control resources by either removing operation units for certain formats and operations, or merging multiple formats into one. Respectively: ADDition/MULtiplication operation group DIVision/SQuare RooT operation group NON COMPuting operation group CONVersion operation group
`PipeConfig`	pipe_config_t	AFTER	Pipeline Register Placement This parameter controls where pipeling registers (number defined by `PipeRegs`) are placed in each operational unit. AFTER means they are all placed at the output of each operational unit. See Synthesizing with the FPU advices to get best synthesis results.

Table 7 Other CVFPU parameters
Name	Type/Range	Value	Description
`TagType`		logic	The SystemVerilog data type of the operation tag input and output ports.
`TrueSIMDClass`	int	0	Vectorial mode classify operation RISC-V compliancy.
`EnableSIMDMask`	int	0	Inactive vectorial lanes floating-point status flags masking.

FP Register File

By default a dedicated register file consisting of 32 floating-point registers, f0-f31, is instantiated. This default behavior can be overruled by setting the parameter ZFINX of the cv32e40p_top top level module to 1, in which case the dedicated register file is not included and the general purpose register file is used instead to host the floating-point operands.

The latency of the individual instructions are explained in Cycle counts per instruction type table.

To allow FPU unit to be put in sleep mode at the same time the core is doing so, a clock gating cell is instantiated in cv32e40p_top top level module as well with its enable signal being inverted core_sleep_o core output.

FP CSR

When using floating-point extensions the standard specifies a floating-point status and control register (Floating-point control and status register (fcsr)) which contains the exceptions that occurred since it was last reset and the rounding mode. Floating-point accrued exceptions (fflags) and Floating-point dynamic rounding mode (frm) can be accessed directly or via Floating-point control and status register (fcsr) which is mapped to those two registers.

Reminder for programmers

As mentioned in RISC-V Privileged Architecture specification, mstatus.FS should be set to Initial to be able to use FP instructions. If mstatus.FS = Off (reset value), any instruction that attempts to read or write the Floating-Point state (F registers or F CSRs) will cause an illegal instruction exception.

Upon interrupt or context switch events, mstatus.SD should be read to see if Floating-Point state has been altered. If following executed program (interrupt routine or whatsover) is going to use FP instructions and only if mstatus.SD = 1 (means FS = Dirty), then the whole FP state (F registers and F CSRs) should be saved in memory and program should set mstatus.FS to Clean. When returning to interrupted or main program, if mstatus.FS = Clean then the whole FP state should be restored from memory.