eXtension Interface

The eXtension interface enables extending CPU with (custom or standardized) instructions without the need to change the RTL of CPU itself. Extensions can be provided in separate modules external to CPU and are integrated at system level by connecting them to the eXtension interface.

The eXtension interface provides low latency (tightly integrated) read and write access to the CPU register file. All opcodes which are not used (i.e. considered to be invalid) by CPU can be used for extensions. It is recommended however that custom instructions do not use opcodes that are reserved/used by RISC-V International.

The eXtension interface enables extension of CPU with:

  • Custom ALU type instructions.

  • Custom CSRs and related instructions.

Control-Transfer type instructions (e.g. branches and jumps) are not supported via the eXtension interface.

CV-X-IF

The terminology eXtension interface and CV-X-IF are used interchangeably.

Parameters

The CV-X-IF specification contains the following parameters:

Table 1 Interface parameters

Name

Type/Range

Default

Description

X_NUM_RS

int unsigned (2..3)

2

Number of register file read ports that can be used by the eXtension interface.

X_ID_WIDTH

int unsigned (3..32)

4

Identification (id) width for the eXtension interface.

X_RFR_WIDTH

int unsigned (32, 64)

32

Register file read access width for the eXtension interface. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32P). If XLEN = 64, then the legal value is (only) 64.

X_RFW_WIDTH

int unsigned (32, 64)

32

Register file write access width for the eXtension interface. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32D). If XLEN = 64, then the legal value is (only) 64.

X_NUM_HARTS

int unsigned (1..2^MXLEN)

1

Number of harts (hardware threads) associated with the interface. The CPU determines the legal values for this parameter.

X_HARTID_WIDTH

int unsigned (1..MXLEN)

1

Width of hartid signals. Must be at least 1. Limited by the RISC-V privileged specification to MXLEN. The CPU determines the legal values for this parameter.

X_MISA

logic [31:0]

32’b0

MISA extensions implemented on the eXtension interface. The CPU determines the legal values for this parameter.

X_ECS_XS

logic [1:0]

2’b0

Initial value for mstatus.XS.

X_DUALREAD

int unsigned (0..3)

0

Is dual read supported? 0: No, 1: Yes, for rs1, 2: Yes, for rs1 - rs2, 3: Yes, for rs1 - rs3. Legal values are determined by the CPU.

X_DUALWRITE

int unsigned (0..1)

0

Is dual write supported? 0: No, 1: Yes. Legal values are determined by the CPU.

X_ISSUE_REGISTER_SPLIT

int unsigned (0..1)

0

Does the interface pipeline register interface? 0: No, 1: Yes. Legal values are determined by the CPU. If 1, registers are provided after the issue of the instruction. If 0, registers are provided at the same time as issue.

Note

A CPU shall clearly document which X_MISA values it can support and there is no requirement that a CPU can support all possible X_MISA values. For example, if a CPU only supports machine mode, then it is not reasonable to expect that the CPU will additionally support user mode by just setting the X_MISA[20] (U bit) to 1.

Additionally, the following type definitions are defined to improve readability of the specification and ensure consistency between the interfaces:

Table 3 Interface type definitions

Name

Definition

Description

readregflags_t

logic [X_NUM_RS+X_DUALREAD-1:0]

Vector with a flag per possible source register. This depends upon the number of read ports and their ability to read register pairs. The bit positions map to registers as follows: Low indices correspond to low operand numbers, and the even part of the pair has the lower index than the odd one.

writeregflags_t

logic [X_DUALWRITE:0]

Bit vector indicating destination registers for write back. The width depends on the ability to perform dual write. If X_DUALWRITE = 0, this signal is a single bit. Bit 1 may only be set when bit 0 is also set. In this case, the vector indicates that a register pair is used.

mode_t

logic [X_NUM_RS-1:0][X_RFR_WIDTH-1:0]

Privilege level (2’b00 = User, 2’b01 = Supervisor, 2’b10 = Reserved, 2’b11 = Machine).

id_t

logic [X_ID_WIDTH-1:0]

Identification of the offloaded instruction. See Identification for details on the identifiers

hartid_t

logic [X_HARTID_WIDTH-1:0]

Identification of the hart offloading the instruction. Only relevant in multi-hart systems. Hart IDs are not required to to be numbered continuously. The hart ID would usually correspond to mhartid, but it is not required to do so.

Major features

The major features of CV-X-IF are:

  • Minimal requirements on extension instruction encoding.

    If an extension instruction relies on reading from or writing to the core’s general purpose register file, then the standard RISC-V bitfield locations for rs1, rs2, rs3, rd as used for non-compressed instructions ([RISC-V-UNPRIV]) must be used. Bitfields for unused read or write operands can be fully repurposed. Extension instructions can either use the compressed or uncompressed instruction format. For offloading compressed instructions the coprocessor must provide the core with the related non-compressed instructions.

  • Support for dual writeback instructions (optional, based on X_DUALWRITE).

    CV-X-IF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file writebacks. Dual writeback is supported for even-odd register pairs (Xn and Xn+1 with n being an even number extracted from instruction bits [11:7]).

    Dual register file writeback is only supported for XLEN = 32.

  • Support for dual read instructions (per source operand) (optional, based on X_DUALREAD).

    CV-X-IF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file reads. Dual read is supported for even-odd register pairs (Xn and Xn+1, with n being an even number extracted from instruction bits [19:15]), [24:20] and [31:27] (i.e. rs1, rs2 and rs3). Dual read can therefore provide up to six 32-bit operands per instruction.

    When a dual read is performed with n = 0, the entire operand is 0, i.e. x1 shall not need to be accessed by the CPU.

    Dual register file read is only supported for XLEN = 32.

  • Support for ternary operations.

    CV-X-IF optionally supports ISA extensions implementing instructions which use three source operands. Ternary instructions must be encoded in the R4-type instruction format defined by [RISC-V-UNPRIV].

  • Support for instruction speculation.

    CV-X-IF indicates whether offloaded instructions are allowed to be committed (or should be killed).

CV-X-IF consists of the following interfaces:

  • Compressed interface. Signaling of compressed instruction to be offloaded.

  • Issue (request/response) interface. Signaling of the uncompressed instruction to be offloaded.

  • Register interface. Signaling of General Purpose Registers (GPRs) and CSRs.

  • Commit interface. Signaling of control signals related to whether instructions can be committed or should be killed.

  • Result interface. Signaling of the instruction result(s).

Operating principle

CPU will attempt to offload every (compressed or non-compressed) instruction that it does not recognize as a legal instruction itself. In case of a compressed instruction the coprocessor must first provide the core with a matching uncompressed (i.e. 32-bit) instruction using the compressed interface. This non-compressed instruction is then attempted for offload via the issue interface.

Offloading of the (non-compressed, 32-bit) instructions happens via the issue interface. The external coprocessor can decide to accept or reject the instruction offload. In case of acceptation the coprocessor will further handle the instruction. In case of rejection the core will raise an illegal instruction exception. The core provides the required register file operand(s) to the coprocessor via the register interface. If an offloaded instruction uses any of the register file sources rs1, rs2 or rs3, then these are always encoded in instruction bits [19:15], [24:20] and [31:27] respectively. The coprocessor only needs to wait for the register file operands that a specific instruction actually uses. The coprocessor informs the core to which register(s) in the register file it will writeback. The CPU uses this information to track data dependencies between instructions.

Offloaded instructions are speculative; CPU has not necessarily committed to them yet and might decide to kill them (e.g. because they are in the shadow of a taken branch or because they are flushed due to an exception in an earlier instruction). Via the commit interface the core will inform the coprocessor about whether an offloaded instruction will either need to be killed or whether the core will guarantee that the instruction is no longer speculative and is allowed to be committed.

The final result of an accepted offloaded instruction can be written back into the coprocessor itself or into the CPU’s register file. Either way, the result interface is used to signal to the CPU that the instruction has completed. Apart from a possible writeback into the register file, the result interface transaction is for example used in the core to increment the minstret CSR, to implement the fence instructions and to judge if instructions before a WFI instruction have fully completed (so that sleep mode can be entered if needed).

In short: From a functional perspective it should not matter whether an instruction is handled inside the CPU or inside a coprocessor. In both cases the instructions need to obey the same instruction dependency rules, memory consistency rules, load/store address checks, fences, etc.

Interfaces

This section describes the interfaces of CV-X-IF. Port directions are described as seen from the perspective of the CPU. The coprocessor will have opposite pin directions. Stated signals names are not mandatory, but it is highly recommended to at least include the stated names as part of actual signal names. It is for example allowed to add prefixes and/or postfixes (e.g. x_ prefix or _i, _o postfixes) or to use different capitalization. A name mapping should be provided if non obvious renaming is applied.

Identification

Most interfaces of CV-X-IF all use a signal called id, which serves as a unique identification number for offloaded instructions. The same id value shall be used for all transaction packets on all interfaces that logically relate to the same instruction. An id value can be reused after an earlier instruction related to the same id value is no longer consider in-flight. The id values for in-flight offloaded instructions are required to be unique. The id values are required to be incremental from one issue transaction to the next. The increment may be greater than one. If the next id would be greater than the maximum value (2**X_ID_WIDTH - 1), the value of id wraps.

id values can only be introduced by the issue interface.

An id becomes in-flight in the first cycle that issue_valid is 1 for that id.

An id ends being in-flight when one of the following scenarios apply:

  • the corresponding issue request transaction is retracted.

  • the corresponding issue request transaction is not accepted and the corresponding commit handshake has been performed.

  • the corresponding result transaction has been performed.

For the purpose of relative identification, an instruction is considered to be preceding another instruction, if it was accepted in an issue transaction at an earlier time. The other instruction is thus succeeding the earlier one.

Multiple Harts

The interface can be used in systems with multiple harts (hardware threads). This includes scenarios with multiple CPUs and multi-threaded implementations of CPUs. RISC-V distinguishes between harts using hartid, which we also introduce to the interface. It is required to identify the source of the offloaded instruction, as multiple harts might be able to offload via a shared interface. No duplicates of the combination of hartid and id may be in flight at any time within one instance of the interface. Any state within the coprocessor (e.g. custom CSRs) must be duplicated according to the number of harts (indicated by the X_NUM_HARTS parameter). Execution units may be shared among threads of the coprocessor, and conflicts around such resources must be managed by the coprocessor.

Note

The interface can be used in scenarios where the CPU is superscalar, i.e. it can issue more than one instruction per cycle. In such scenarios, the coprocessor is usually required to also be able to accept more than one instruction per cycle. Our expectation is that implementers will duplicate the interface according to the issue width.

Compressed interface

Table 4 describes the compressed interface signals.

Table 4 Compressed interface signals

Signal

Type

Direction (CPU)

Description

compressed_valid

logic

output

Compressed request valid. Request to uncompress a compressed instruction.

compressed_ready

logic

input

Compressed request ready. The transactions signaled via compressed_req and compressed_resp are accepted when compressed_valid and compressed_ready are both 1.

compressed_req

x_compressed_req_t

output

Compressed request packet.

compressed_resp

x_compressed_resp_t

input

Compressed response packet.

Table 5 describes the x_compressed_req_t type.

Table 5 Compressed request type

Signal

Type

Description

instr

logic [15:0]

Offloaded compressed instruction.

hartid

hartid_t

Identification of the hart offloading the instruction.

The instr[15:0] signal is used to signal compressed instructions that are considered illegal by CPU itself. A coprocessor can provide an uncompressed instruction in response to receiving this.

A compressed request transaction is defined as the combination of all compressed_req signals during which compressed_valid is 1 and the hartid remains unchanged. A CPU is allowed to retract its compressed request transaction before it is accepted with compressed_ready = 1 and it can do so in the following ways:

  • Set compressed_valid = 0.

  • Keep compressed_valid = 1, but change the hartid signal (and if desired change the other signals in compressed_req).

The signals in compressed_req are valid when compressed_valid is 1. These signals remain stable during a compressed request transaction (if hartid changes while compressed_valid remains 1, then a new compressed request transaction started).

Table 6 describes the x_compressed_resp_t type.

Table 6 Compressed response type

Signal

Type

Description

instr

logic [31:0]

Uncompressed instruction.

accept

logic

Is the offloaded compressed instruction (id) accepted by the coprocessor?

The signals in compressed_resp are valid when compressed_valid and compressed_ready are both 1. There are no stability requirements.

The CPU will attempt to offload every compressed instruction that it does not recognize as a legal instruction itself. CPU might also attempt to offload compressed instructions that it does recognize as legal instructions itself.

The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:

  • is considered to be valid by the CPU and accepted by the coprocessor (accept = 1).

  • is considered neither to be valid by the CPU nor accepted by the coprocessor (accept = 0).

The accept signal of the compressed interface merely indicates that the coprocessor accepts the compressed instruction as an instruction that it implements and translates into its uncompressed counterpart. Typically an accepted transaction over the compressed interface will be followed by a corresponding transaction over the issue interface, but there is no requirement on the CPU to do so (as the instructions offloaded over the compressed interface and issue interface are allowed to be speculative). Only when an accept is signaled over the issue interface, then an instruction is considered accepted for offload.

The coprocessor shall not take the mstatus based extension context status (see ([RISC-V-PRIV])) into account when generating the accept signal on its compressed interface (but it shall take it into account when generating the accept signal on its issue interface).

Issue interface

Table 7 describes the issue interface signals.

Table 7 Issue interface signals

Signal

Type

Direction (CPU)

Description

issue_valid

logic

output

Issue request valid. Indicates that CPU wants to offload an instruction.

issue_ready

logic

input

Issue request ready. The transaction signaled via issue_req and issue_resp is accepted when issue_valid and issue_ready are both 1.

issue_req

x_issue_req_t

output

Issue request packet.

issue_resp

x_issue_resp_t

input

Issue response packet.

Table 8 describes the x_issue_req_t type.

Table 8 Issue request type

Signal

Type

Description

instr

logic [31:0]

Offloaded instruction.

hartid

hartid_t

Identification of the hart offloading the instruction.

id

id_t

Identification of the offloaded instruction.

An issue request transaction is defined as the combination of all issue_req signals during which issue_valid is 1 and the hartid remains unchanged. A CPU is allowed to retract its issue request transaction before it is accepted with issue_ready = 1 and it can do so in the following ways:

  • Set issue_valid = 0.

  • Keep issue_valid = 1, but change the hartid signal (and if desired change the other signals in issue_req).

The instr, hartid, and id signals are valid when issue_valid is 1. The instr signal remains stable during an issue request transaction.

Table 10 describes the x_issue_resp_t type.

Table 10 Issue response type

Signal

Type

Description

accept

logic

Is the offloaded instruction (id) accepted by the coprocessor?

writeback

writeregflags_t

Will the coprocessor perform a writeback in the core to rd? Writeback to x0 or the x0, x1 pair is allowed by the coprocessor, but will be ignored by the CPU. A coprocessor must signal writeback as 0 for non-accepted instructions. Writeback to a register pair is only allowed if X_DUALWRITE = 1 and instruction bits [11:7] are even.

register_read

readregflags_t

Will the coprocessor perform require specific registers to be read? A coprocessor may only request an odd register of a pair, if it also requests the even register of a pair. A coprocessor must signal register_read as 0 for non-accepted instructions.

ecswrite

logic

Will the coprocessor perform a writeback in the core to mstatus.xs, mstatus.fs, mstatus.vs? A coprocessor must signal ecswrite as 0 for non-accepted instructions.

The core shall attempt to offload instructions via the issue interface for the following two main scenarios:

  • The instruction is originally non-compressed and it is not recognized as a valid instruction by the CPU’s non-compressed instruction decoder.

  • The instruction is originally compressed and the coprocessor accepted the compressed instruction and provided a 32-bit uncompressed instruction. In this case the 32-bit uncompressed instruction will be attempted for offload even if it matches in the CPU’s non-compressed instruction decoder.

Apart from the above two main scenarios a CPU may also attempt to offload (compressed/uncompressed) instructions that it does recognize as legal instructions itself. In case that both the CPU and the coprocessor accept the same instruction as being valid, the instruction will cause an illegal instruction fault upon execution.

The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:

  • is considered to be valid by the CPU and accepted by the coprocessor (accept = 1).

  • is considered neither to be valid by the CPU nor accepted by the coprocessor (accept = 0).

A coprocessor can (only) accept an offloaded instruction when:

  • It can handle the instruction (based on decoding instr).

  • There are no structural hazards that would prevent execution.

A transaction is considered offloaded/accepted on the positive edge of clk when issue_valid, issue_ready are asserted and accept is 1. A transaction is considered not offloaded/rejected on the positive edge of clk when issue_valid and issue_ready are asserted while accept is 0.

The signals in issue_resp are valid when issue_valid and issue_ready are both 1. There are no stability requirements.

Register interface

Table 12 describes the register interface signals.

Table 12 Register interface signals

Signal

Type

Direction (CPU)

Description

register_valid

logic

output

Register request valid. Indicates that CPU provides register contents related to an instruction.

register_ready

logic

input

Register request ready. The transaction signaled via register_req is accepted when register_valid and register_ready are both 1.

register

x_register_t

output

Register packet.

Table 13 describes the x_register_t type.

Table 13 Register type

Signal

Type

Description

hartid

hartid_t

Identification of the hart offloading the instruction.

id

id_t

Identification of the offloaded instruction.

rs[X_NUM_RS-1:0]

logic [X_RFR_WIDTH-1:0]

Register file source operands for the offloaded instruction.

rs_valid

readregflags_t

Validity of the register file source operand(s). If register pairs are supported, the validity is signaled for each register within the pair individually.

ecs

logic [5:0]

Extension Context Status ({mstatus.xs, mstatus.fs, mstatus.vs}).

ecs_valid

logic

Validity of the Extension Context Status.

There are two main scenarios, in how the register interface will be used. They are selected by X_ISSUE_REGISTER_SPLIT:

  1. X_ISSUE_REGISTER_SPLIT = 0: A register transaction can be started in the same clock cycle as the issue transaction (issue_valid = register_valid, issue_ready = register_ready, issue_req.hartid = register.hartid and issue_req.id = register.id). In this case, the CPU will speculatively provide all possible source registers via register.rs when they become available (signalled via the respective rs_valid signals). The coprocessor will delay accepting the instruction until all necessary registers are provided, and only then assert issue_ready and register_ready. The rs_valid bits are not required to be stable during the transaction. Each bit can transition from 0 to 1, but is not allowed to transition back to 0 during a transaction. A coprocessor is not expected to wait for all rs_valid bits to be 1, but only for those registers it intends to read. The rs signals are only required to be stable during the part of a transaction in which these signals are considered to be valid. The ecs_valid bit is not required to be stable during the transaction. It can transition from 0 to 1, but is not allowed to transition back to 0 during a transaction. The ecs signal is only required to be stable during the part of a transaction in which this signals is considered to be valid.

  2. X_ISSUE_REGISTER_SPLIT = 1: For a CPU which splits the issue and register interface into subsequent pipeline stages (e.g. because it has a dedicated read registers (RR) stage), the registers will be provided after the issue transaction completed. The CPU initiates the register transaction once all registers are available. If the coprocessor is able to accept multiple issue transactions before receiving the registers, the register transaction can occur in a different order. This allows the CPU to reorder instructions based on the availability of operands. The coprocessor is always expected to be ready to retrieve its operands via the register interface after accepting the issue of an instruction. Therefore, register_ready is tied to 1. The register_valid signal will be 1 for one cycle, and rs_valid is guaranteed to be equal to the corresponding issue_resp.register_read. Thus, a coprocessor can ignore rs_valid in this case and a CPU may chose to not implement the signal. The same applies to the ecs and ecs_valid signals.

In both scenarios, the following applies: The hartid, id, ecs_valid and rs_valid signals are valid when register_valid is 1. The rs signal is only considered valid when register_valid is 1 and the corresponding bit in rs_valid is 1 as well. The ecs signal is only considered valid when register_valid is 1 and ecs_valid is 1 as well.

The rs[X_NUM_RS-1:0] signals provide the register file operand(s) to the coprocessor. In case that XLEN = X_RFR_WIDTH, then the regular register file operands corresponding to rs1, rs2 or rs3 are provided. In case XLEN != X_RFR_WIDTH (i.e. XLEN = 32 and X_RFR_WIDTH = 64), then the rs[X_NUM_RS-1:0] signals provide two 32-bit register file operands per index (corresponding to even/odd register pairs) with the even register specified in rs1, rs2 or rs3. The register file operand for the even register file index is provided in the lower 32 bits; the register file operand for the odd register file index is provided in the upper 32 bits. When reading from the x0, x1 pair, then a value of 0 is returned for the entire operand. The X_DUALREAD parameter defines whether dual read is supported and for which register file sources it is supported.

The ecs signal provides the Extension Context Status from the mstatus CSR to the coprocessor.

Commit interface

Table 14 describes the commit interface signals.

Table 14 Commit interface signals

Signal

Type

Direction (CPU)

Description

commit_valid

logic

output

Commit request valid. Indicates that CPU has valid commit or kill information for an offloaded instruction. There is no corresponding ready signal (it is implicit and assumed 1). The coprocessor shall be ready to observe the commit_valid and commit_kill signals at any time coincident or after an issue transaction initiation.

commit

x_commit_t

output

Commit packet.

Table 15 describes the x_commit_t type.

Table 15 Commit packet type

Signal

Type

Description

hartid

hartid_t

Identification of the hart offloading the instruction.

id

id_t

Identification of the offloaded instruction. Valid when commit_valid is 1.

commit_kill

logic

If commit_valid is 1 and commit_kill is 0, then the core guarantees that the offloaded instruction (id) and any older (i.e. preceding) instructions are no longer speculative, will not get killed (e.g. due to misspeculation or an exception in a preceding instruction), and are allowed to be committed. If commit_valid is 1 and commit_kill is 1, then the offloaded instruction (id) and any newer (i.e. succeeding) instructions shall be killed in the coprocessor and the coprocessor must guarantee that the related instructions do/did not change architectural state.

The commit_valid signal will be 1 exactly one clk cycle. It is not required that a commit transaction is performed for each offloaded instruction individually. Instructions can be signalled to be non-speculative or to be killed in batch. E.g. signalling the oldest instruction to be killed is equivalent to requesting a flush of the coprocessor. The first instruction to be considered not-to-be-killed after a commit transaction with commit_kill as 1, is at earliest an instruction with successful issue transaction starting at least one clock cycle later.

Note

If an instruction is marked in the coprocessor as killed or committed, the coprocessor shall ignore any subsequent commit transaction related to that instruction.

Note

A coprocessor must be tolerant to any possible commit.id, whether this represents and in-flight instruction or not. In this case, the coprocessor may still need to process the request by considering the relevant instructions (either preceding or succeeding) as no longer speculative or to be killed. This behavior supports scenarios in which more than one coprocessor is connected to an issue interface.

A CPU is required to mark every instruction that has completed the issue transaction as either killed or non-speculative. This includes accepted (issue_resp.accept = 1) and rejected instructions (issue_resp.accept = 0).

A coprocessor does not have to wait for commit_valid to become asserted. It can speculate that an offloaded accepted instruction will not get killed, but in case this speculation turns out to be wrong because the instruction actually did get killed, then the coprocessor must undo any of its internal architectural state changes that are due to the killed instruction.

A coprocessor is not allowed to perform speculative result transactions and shall therefore never initiate a result transaction for instructions that have not yet received a commit transaction with commit_kill = 0. The earliest point at which a coprocessor can initiate a result handshake for an instruction is therefore the cycle in which commit_valid = 1 and commit_kill = 0 for that instruction.

The signals in commit are valid when commit_valid is 1.

Memory (request/response) interface

The memory (request/response) interface is not included in this version of the specification

Memory result interface

The memory (request/response) interface is not included in this version of the specification

Result interface

Table 21 describes the result interface signals.

Table 21 Result interface signals

Signal

Type

Direction (CPU)

Description

result_valid

logic

input

Result request valid. Indicates that the coprocessor has a valid result (write data or exception) for an offloaded instruction.

result_ready

logic

output

Result request ready. The result signaled via result is accepted by the core when result_valid and result_ready are both 1.

result

x_result_t

input

Result packet.

The coprocessor shall provide results to the core via the result interface. A coprocessor is allowed to provide results to the core in an out of order fashion. A coprocessor is only allowed to provide a result for an instruction once the core has indicated (via the commit interface) that this instruction is allowed to be committed. Each accepted offloaded (committed and not killed) instruction shall have exactly one result transaction (even if no data needs to be written back to the CPU’s register file). No result transaction shall be performed for instructions which have not been accepted for offload or for instructions that have been killed.

Table 22 describes the x_result_t type.

Table 22 Result packet type

Signal

Type

Description

hartid

hartid_t

Identification of the hart offloading the instruction.

id

id_t

Identification of the offloaded instruction.

data

logic [X_RFW_WIDTH-1:0]

Register file write data value(s).

rd

logic [4:0]

Register file destination address(es).

we

writeregflags_t

Register file write enable(s).

ecswe

logic [2:0]

Write enables for mstatus.xs, mstatus.fs, mstatus.vs.

ecsdata

logic [5:0]

Write data value for {mstatus.xs, mstatus.fs, mstatus.vs}.

exc

logic

Did the instruction cause a synchronous exception?

exccode

logic [5:0]

Exception code.

dbg

logic

Did the instruction cause a debug trigger match with mcontrol.timing = 0?

err

logic

Did the instruction cause a bus error?

A result transaction starts in the cycle that result_valid = 1 and ends in the cycle that both result_valid = 1 and result_ready = 1. The signals in result are valid when result_valid is 1. The signals in result shall remain stable during a result transaction.

we is 2 bits wide when XLEN = 32 and X_RFW_WIDTH = 64, and 1 bit wide otherwise. The CPU shall ignore writeback to x0. When a dual writeback is performed to the x0, x1 pair, the entire write shall be ignored, i.e. neither x0 nor x1 shall be written by the CPU. For an instruction instance, the we signal must be the same as issue_resp.writeback. The CPU is not required to check that these signals match.

Note

issue_resp.writeback and result.we carry the same information. Nevertheless, result.we is provided to simplify the CPU logic. Without this signal, the CPU would have to look this information up based on the instruction id.

If ecswe[2] is 1, then the value in ecsdata[5:4] is written to mstatus.xs. If ecswe[1] is 1, then the value in ecsdata[3:2] is written to mstatus.fs. If ecswe[0] is 1, then the value in ecsdata[1:0] is written to mstatus.vs. The writes to the stated mstatus bitfields will take into account any WARL rules that might exist for these bitfields in the CPU.

Interface dependencies

The following rules apply to the relative ordering of the interface handshakes:

  • The compressed interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload instructions that it does not consider to be valid itself.

  • The issue interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload instructions that it does not consider to be valid itself.

  • Every issue interface transaction has an associated register interface transaction. It is not required for register transactions to be in the same order as the issue transactions.

  • Every issue interface transaction (whether accepted or not) has an associated commit interface transaction and both interfaces use a matching transaction ordering.

  • If an offloaded instruction is accepted and allowed to commit, then for each such instruction one result transaction must occur via the result interface (even if no writeback needs to happen to the core’s register file). The transaction ordering on the result interface does not have to correspond to the transaction ordering on the issue interface.

  • A commit interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.

  • A result interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.

  • A result interface handshake cannot be initiated before the corresponding commit interface handshake is initiated (and the instruction is allowed to commit). It is allowed to be initiated at the same time or later.

  • A result interface handshake cannot be (or have been) initiated for killed instructions.

Handshake rules

The following handshake pairs exist on the eXtension interface:

  • compressed_valid with compressed_ready.

  • issue_valid with issue_ready.

  • register_valid with register_ready.

  • commit_valid with implicit always ready signal.

  • result_valid with result_ready.

The only rule related to valid and ready signals is that:

  • A transaction is considered accepted on the positive clk edge when both valid and (implicit or explicit) ready are 1.

Specifically note the following:

  • The valid signals are allowed to be retracted by a CPU (e.g. in case that the related instruction is killed in the CPU’s pipeline before the corresponding ready is signaled).

  • A new transaction can be started by a CPU by changing the id signal and keeping the valid signal asserted (thereby possibly terminating a previous transaction before it completed).

  • The valid signals are not allowed to be retracted by a coprocessor (e.g. once result_valid is asserted it must remain asserted until the handshake with result_ready has been performed). A new transaction can therefore not be started by a coprocessor by just changing the id signal and keeping the valid signal asserted if no ready has been received yet for the original transaction. The cycle after receiving the ready signal, a next (back-to-back) transaction is allowed to be started by just keeping the valid signal high and changing the id to that of the next transaction.

  • The ready signals is allowed to be 1 when the corresponding valid signal is not asserted.

Signal dependencies

A CPU shall not have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals, except for the following allowed paths:

  • paths from result_valid, result to rs, rs_valid.

Note

The above implies that the non-compressed instruction instr[31:0] received via the compressed interface is not allowed to combinatorially feed into the issue interface’s instr[31:0] instruction.

A coprocessor is allowed (and expected) to have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals. In order to prevent combinatorial loops the following combinatorial paths are not allowed in a coprocessor:

  • paths from rs, rs_valid to result_valid, result.

Note

The above implies that a coprocessor has a pipeline stage separating the register file operands from its result generating circuit (similar to the separation between decode stage and execute stage found in many CPUs).

Note

As a CPU is allowed to retract transactions on its compressed and issue interfaces, the compressed_ready and issue_ready signals will have to depend on signals received from the CPU in a combinatorial manner (otherwise these ready signals might be signaled for the wrong id).

Handshake dependencies

In order to avoid system level deadlock both the CPU and the coprocessor shall obey the following rules:

  • The valid signal of a transaction shall not be dependent on the corresponding ready signal.

  • Transactions related to an earlier part of the instruction flow shall not depend on transactions with the same id related to a later part of the instruction flow. The instruction flow is defined from earlier to later as follows:

    • compressed transaction

    • issue transaction

    • register transaction

    • commit transaction

    • result transaction.

  • Transactions with an earlier issued id shall not depend on transactions with a later issued id (e.g. a coprocessor is not allowed to delay generating result_valid = 1 because it first wants to see commit_valid = 1 for a newer instruction).

Note

The use of the words depend and dependent relate to logical relationships, which is broader than combinatorial relationships.

Appendix

This appendix contains several useful, non-normative pieces of information that help implementing the eXtension Interface.

SystemVerilog example

In the src folder of this project, the file https://github.com/openhwgroup/core-v-xif/blob/main/src/core_v_xif.sv contains a non-normative realization of this specification based on SystemVerilog interfaces. Of course the use of SystemVerilog (interfaces) is not mandatory.

Coprocessor recommendations

A coprocessor is recommended (but not required) to follow the following suggestions to maximize its re-use potential:

  • Avoid using opcodes that are reserved or already used by RISC-V International unless for supporting a standard RISC-V extension.

  • Make it easy to change opcode assignments such that a coprocessor can easily be updated if it conflicts with another coprocessor.

  • Clearly document the supported and required parameter values.

Timing recommendations

The integration of the eXtension interface will vary from CPU to CPU, and thus require its own set of timing constraints.

CV32E40X eXtension timing budget shows the recommended timing budgets for the coprocessor and (optional) interconnect for the case in which a coprocessor is paired with the CV32E40X (https://github.com/openhwgroup/cv32e40x) processor. As is shown in that timing budget, the coprocessor only receives a small part of the timing budget on the paths through xif_issue_if.issue_req.rs*. This enables the coprocessor to source its operands directly from the CV32E40X register file bypass network, thereby preventing stall cycles in case an offloaded instruction depends on the result of a preceding non-offloaded instruction. This implies that, if a coprocessor is intended for pairing with the CV32E40X, it will be beneficial timing wise if the coprocessor does not directly operate on the rs* source inputs, but registers them instead. To maximize utilization of a coprocessor with various CPUs, such registers could be made optional via a parameter.

Verification

A UVM agent for the interface was developed for the verification of CVA6. It can be accessed under https://github.com/openhwgroup/core-v-verif/tree/master/lib/uvm_agents/uvma_cvxif.