eXtension Interface
The eXtension interface enables extending the CPU with (custom or standardized) instructions without the need to change the RTL of the CPU itself. An extension can be provided in a separate module external to the CPU and is integrated at system level by connecting it to the eXtension interface.
The eXtension interface provides low latency (tightly integrated) read and write access to the CPU register file. All opcodes which are not used (i.e. considered to be invalid) by the CPU can be used for extensions. It is recommended however that custom instructions do not use opcodes that are reserved/used by RISC-V International.
The eXtension interface enables extension of the CPU with:
Control-Transfer type instructions (e.g. branches and jumps) are not supported via the eXtension interface.
CV-X-IF
The terminology eXtension interface
and CV-X-IF
are used interchangeably.
Parameters
The CV-X-IF specification contains two kinds of parameters. The first kind of parameters is configured for the coprocessor. Not all possible values of parameter might be supported by the CPU, in which case it determines the legal values.
The second kind of parameter is a system parameter, i.e. it is determined based on the configuration of the CPU and the coprocessor.
This includes X_ID_WIDTH
and X_HARTID_WIDTH
.
Name |
Type/Range |
Default |
Description |
---|---|---|---|
|
int unsigned (2..3) |
2 |
Number of register file read ports that can be used by the eXtension interface. Legal values are determined by the CPU. |
|
int unsigned (3..32) |
4 |
Identification ( |
|
int unsigned (32, 64) |
32 |
Register file read access width for the eXtension interface. Legal values are determined by the CPU. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32P). If XLEN = 64, then the legal value is (only) 64. |
|
int unsigned (32, 64) |
32 |
Register file write access width for the eXtension interface. Legal values are determined by the CPU. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32D). If XLEN = 64, then the legal value is (only) 64. |
|
int unsigned (1..2^MXLEN) |
1 |
Number of harts (hardware threads) associated with the interface. Legal values are determined by the CPU. |
|
int unsigned (1..MXLEN) |
1 |
Width of |
|
logic [25:0] |
32’b0 |
MISA extensions implemented on the eXtension interface. Legal values are determined by the CPU. |
|
int unsigned (0..3) |
0 |
Is dual read supported? 0: No, 1: Yes, for |
|
int unsigned (0..1) |
0 |
Is dual write supported? 0: No, 1: Yes. Legal values are determined by the CPU. |
|
int unsigned (0..1) |
0 |
Are the issue interface and register interface split? 0: No, 1: Yes. Legal values are determined by the CPU. If 1, registers are provided after the issue of the instruction. If 0, registers are provided at the same time as issue. |
The CPU shall set the misa.Extensions
field to a value that is the result of an or operation of its own Extensions and the X_MISA
parameter.
Not all bits of misa.Extensions
will be legal for a coprocessor to set, e.g. if this extension is already implemented in the CPU or if it is an extension not possible to implement as part of a coprocessor like privileged extensions.
Note
A CPU shall clearly document which X_MISA
values it can support and there is no requirement that a CPU can support
all possible X_MISA
values. For example, if a CPU only supports machine mode, then it is not reasonable to expect that the
CPU will additionally support user mode by just setting the X_MISA[20]
(U
bit) to 1.
Additionally, the following type definitions are defined to improve readability of the specification and ensure consistency between the interfaces:
Name |
Definition |
Description |
---|---|---|
|
logic [X_NUM_RS+X_DUALREAD-1:0] |
Vector with a flag per possible source register. This depends upon the number of read ports and their ability to read register pairs. The bit positions map to registers as follows: Low indices correspond to low operand numbers, and the even part of the pair has a lower index than the odd one. |
|
logic [X_DUALWRITE:0] |
Bit vector indicating destination registers for write back.
The width depends on the ability to perform dual write.
If |
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. See Identification for details on the identifiers |
|
logic [X_HARTID_WIDTH-1:0] |
Identification of the hart offloading the instruction.
Only relevant in multi-hart systems. Hart IDs are not required to
to be numbered continuously.
The hart ID would usually correspond to |
Major features
The major features of CV-X-IF are:
Minimal requirements on extension instruction encoding.
If an extension instruction relies on reading from or writing to the CPU’s general purpose register file, then the standard RISC-V bitfield locations for rs1, rs2, rs3, rd as used for non-compressed instructions ([RISC-V-UNPRIV]) must be used. Bitfields for unused read or write operands can be fully repurposed. Extension instructions can either use the compressed or uncompressed instruction format. For offloading compressed instructions the coprocessor must provide the CPU with the related non-compressed instructions.
Support for dual write-back instructions (optional, based on
X_DUALWRITE
).CV-X-IF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file write-backs. Dual write-back is supported for even-odd register pairs (
Xn
andXn+1
withn
being an even number extracted from instruction bits[11:7]
).Dual register file write-back is only supported for
XLEN
= 32.Support for dual read instructions (per source operand) (optional, based on
X_DUALREAD
).CV-X-IF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file reads. Dual read is supported for even-odd register pairs. Dual read can therefore provide up to six 32-bit operands per instruction.
When a dual read is performed with
n
= 0, the entire operand is 0, i.e.x1
shall not need to be accessed by the CPU.Dual register file read is only supported for XLEN = 32.
Support for ternary operations.
CV-X-IF optionally supports ISA extensions implementing instructions which use three source operands. RISC-V [RISC-V-UNPRIV] can implement ternary operations using the R-type instruction format (using
rd
asrs3
) or with the R4-type instruction format.Support for instruction speculation.
CV-X-IF indicates whether offloaded instructions are allowed to be committed (or should be killed).
Note
The interface does not provide a mechanism for providing and synchronizing the Extension Context Status (ECS, see [RISC-V-PRIV]). ECS might be needed if an extension has context that needs to be switched upon a task switch. Ensuring that the behavior of the overall system is compliant to [RISC-V-PRIV] is the responsibility of an integrator. It is the intention that future versions of this specification provide a general mechanism to deal with ECS.
CV-X-IF consists of the following interfaces:
Compressed interface. Signaling of compressed instruction to be offloaded.
Issue (request/response) interface. Signaling of the uncompressed instruction to be offloaded.
Commit interface. Signaling of control signals related to whether instructions can be committed or should be killed.
Result interface. Signaling of the instruction result(s).
Operating principle
CPU will attempt to offload every (compressed or non-compressed) instruction that it does not recognize as a legal instruction itself. In case of a compressed instruction the coprocessor must first provide the CPU with a matching uncompressed (i.e. 32-bit) instruction using the compressed interface. This non-compressed instruction is then attempted for offload via the issue interface.
Offloading of the (non-compressed, 32-bit) instructions happens via the issue interface.
The external coprocessor can decide to accept or reject the instruction offload. In case of acceptation the coprocessor
will further handle the instruction. In case of rejection the CPU will raise an illegal instruction exception.
The CPU provides the required register file operand(s) to the coprocessor via the register interface.
If an offloaded instruction uses any of the register file sources rs1
, rs2
, then these are always encoded in instruction bits [19:15]
and
[24:20]
, respectively.
If an offloaded instruction uses the register file source rs3
, then these are encoded in instruction bits [31:27]
if the instruction uses one of the major opcodes MADD, MSUB, NMSUB, or NMADD (R4-type).
Otherwise, rs3
is expected to be encoded in bits [11:7]
.
Note
The fused multiply add instructions of the floating point unit make use of the R4 instruction format.
As this format consumes significant encoding space, other standard and custom extensions are expected to follow the R-type encoding, multiplexing rd
and rs3
.
The coprocessor only needs to wait for the register file operands that a specific instruction actually uses. The coprocessor informs the core to which register(s) in the register file it will write-back. The CPU uses this information to track data dependencies between instructions.
Offloaded instructions are speculative; CPU has not necessarily committed to them yet and might decide to kill them (e.g. because they are in the shadow of a taken branch or because they are flushed due to an exception in an earlier instruction). Via the commit interface the CPU will inform the coprocessor about whether an offloaded instruction will either need to be killed or whether the CPU will guarantee that the instruction is no longer speculative and is allowed to be committed.
The final result of an accepted offloaded instruction can be written back into the coprocessor itself or into the CPU’s register file. Either way, the
result interface is used to signal to the CPU that the instruction has completed. Apart from a possible write-back into the register file, the result
interface transaction is for example used in the CPU to increment the minstret
CSR, to implement the fence instructions and to judge if instructions
before a WFI
instruction have fully completed (so that sleep mode can be entered if needed).
In short: From a functional perspective it should not matter whether an instruction is handled inside the CPU or inside a coprocessor. In both cases the instructions need to obey the same instruction dependency rules, memory consistency rules, load/store address checks, fences, etc.
Interfaces
This section describes the interfaces of CV-X-IF. Port directions are described as seen from the perspective of the CPU.
The coprocessor will have opposite pin directions.
Stated signals names are not mandatory, but it is highly recommended to at least include the stated names as part of actual signal names. It is for example allowed to add prefixes and/or postfixes (e.g. x_
prefix or _i
, _o
postfixes) or to use different capitalization. A name mapping should be provided if non obvious renaming is applied.
Clocking and Signal Stability
The interfaces are required to be synchronous to a common clock (clk
).
The signals of the interface are sampled on the positive edge of clk
.
When stability of signal is referred to in the specification of the interface transactions the following definition is followed. A signal is considered stable, if to consecutive samples of the signal have the same value. A signal’s value may change between the samples and still be considered stable.
Identification
Most interfaces of CV-X-IF use a signal called id
, which serves as a unique identification number for offloaded instructions.
The same id
value shall be used for all transaction packets on all interfaces that logically relate to the same instruction.
An id
value can be reused after an earlier instruction related to the same id
value is no longer consider in-flight.
The id
values for in-flight offloaded instructions are required to be unique.
The id
values are required to be incremental from one issue transaction to the next.
The increment may be greater than one.
If the next id
would be greater than the maximum value (2**X_ID_WIDTH - 1
), the value of id
wraps.
A new id
value is not allowed to be greater than the oldest in-flight instruction, if a wrap has occurred since the oldest in-flight instruction was issued.
If the oldest in-flight instruction is \(id_o\), and the newest is \(id_n\), then the next instruction with \(id_{n+1}\) must satisfy the following conditions:
The first condition applying to cases where the \(id_n\) has not wrapped since the oldest in-flight instruction was issued, and the second where one wrap occurred between \(id_o\) and \(id_n\).
The coprocessor is not required to check the validity of id
values under these constraints.
This has to be guaranteed by design of the CPU.
Note
IDs are not required to be incremental to support scenarios, in which a coprocessor does not see the entire instruction stream. This can be e.g. because offloaded instructions are routed towards different coprocessors.
To make sure feasible id
values are available, X_ID_WIDTH
needs to be sufficiently large.
This can be achieved by calculating the maximum id
increase during the lifetime of the longest executing instruction.
id
values can only be introduced by the issue interface.
An id
becomes in-flight in the first cycle that issue_valid
is 1 for that id
.
An id
ends being in-flight when one of the following scenarios apply:
the corresponding issue request transaction is retracted.
the corresponding issue request transaction is not accepted and the corresponding commit handshake has been performed.
the corresponding result transaction has been performed.
For the purpose of relative identification, an instruction is considered to be preceding another instruction, if it was accepted in an issue transaction at an earlier time. The other instruction is thus succeeding the earlier one.
Multiple coprocessors
This specification defines a point-to-point connection between a CPU and a coprocessor, that is defined in a way that facilitates the integration of multiple coprocessors. The combined interface of the coprocessors must adhere to this specification and thus must behave like a single coprocessor from the CPU point of view. Any implementation is correct, if the CPU is not able to determine that multiple coprocessors are connected. For recommendations, please refer to Recommendations for implementing multiple coprocessors on a shared interface
Multiple Harts
The interface can be used in systems with multiple harts (hardware threads).
This includes scenarios with multiple CPUs and multi-threaded implementations of CPUs.
RISC-V distinguishes between harts using hartid
, which we also introduce to the interface.
It is required to identify the source of the offloaded instruction, as multiple harts might be able to offload via a shared interface.
No duplicates of the combination of hartid
and id
may be in flight at any time within one instance of the interface.
Any state within the coprocessor (e.g. custom CSRs) must be duplicated according to the number of harts (indicated by the X_NUM_HARTS
parameter).
Execution units may be shared among threads of the coprocessor, and conflicts around such resources must be managed by the coprocessor.
Note
The interface can be used in scenarios where the CPU is superscalar, i.e. it can issue more than one instruction per cycle. In such scenarios, the coprocessor is usually required to also be able to accept more than one instruction per cycle. Our expectation is that implementers will duplicate the interface according to the issue width.
Compressed interface
Table 6 describes the compressed interface signals.
Signal |
Type |
Direction (CPU) |
Description |
---|---|---|---|
|
logic |
output |
Compressed request valid. Request to uncompress a compressed instruction. |
|
logic |
input |
Compressed request ready. The transactions signaled via |
|
x_compressed_req_t |
output |
Compressed request packet. |
|
x_compressed_resp_t |
input |
Compressed response packet. |
Table 7 describes the x_compressed_req_t
type.
Signal |
Type |
Description |
---|---|---|
|
logic [15:0] |
Offloaded compressed instruction. |
|
Identification of the hart offloading the instruction. |
The instr[15:0]
signal is used to signal compressed instructions that are considered illegal by CPU itself. A coprocessor can provide an uncompressed instruction
in response to receiving this.
Note
It is not required for a CPU to ensure that the offloaded instruction is a valid 16-bit encoding.
A compressed request transaction is defined as the combination of all compressed_req
signals during which compressed_valid
is 1 and compressed_req
remains unchanged.
A CPU is allowed to retract its compressed request transaction before it is accepted with compressed_ready
= 1 and it can do so in the following ways:
Set
compressed_valid
= 0.Keep
compressed_valid
= 1, but change any of the signals incompressed_req
.
The signals in compressed_req
are valid when compressed_valid
is 1. These signals remain stable during a compressed request transaction.
Table 8 describes the x_compressed_resp_t
type.
Signal |
Type |
Description |
---|---|---|
|
logic [31:0] |
Uncompressed instruction. |
|
logic |
Is the offloaded compressed instruction ( |
The signals in compressed_resp
are valid when compressed_valid
and compressed_ready
are both 1. There are no stability requirements.
The CPU will attempt to offload every compressed instruction that it does not recognize as a legal instruction itself. A CPU might also attempt to offload compressed instructions that it does recognize as legal instructions itself.
A coprocessor may only accept valid 16-bit instructions, i.e. bits [1:0]
must not be binary 11.
The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:
is considered to be valid by the CPU and accepted by the coprocessor (
accept
= 1).is considered neither to be valid by the CPU nor accepted by the coprocessor (
accept
= 0).
The accept
signal of the compressed interface merely indicates that the coprocessor accepts the compressed instruction as an instruction that it implements and translates into
its uncompressed counterpart.
Typically an accepted transaction over the compressed interface will be followed by a corresponding transaction over the issue interface, but there is no requirement
on the CPU to do so (as the instructions offloaded over the compressed interface and issue interface are allowed to be speculative). Only when an accept
is signaled over the issue interface, then an instruction is considered accepted for offload.
Explicitly, the coprocessor shall not execute the instruction after receiving it via the compressed interface.
The coprocessor shall not take the mstatus
based extension context status (see ([RISC-V-PRIV])) into account when generating the accept
signal on its compressed interface (but it shall take
it into account when generating the accept
signal on its issue interface).
Issue interface
Table 9 describes the issue interface signals.
Signal |
Type |
Direction (CPU) |
Description |
---|---|---|---|
|
logic |
output |
Issue request valid. Indicates that CPU wants to offload an instruction. |
|
logic |
input |
Issue request ready. The transaction signaled via |
|
x_issue_req_t |
output |
Issue request packet. |
|
x_issue_resp_t |
input |
Issue response packet. |
Table 10 describes the x_issue_req_t
type.
Signal |
Type |
Description |
---|---|---|
|
logic [31:0] |
Offloaded instruction. |
|
Identification of the hart offloading the instruction. |
|
|
Identification of the offloaded instruction. |
An issue request transaction is defined as the combination of all issue_req
signals during which issue_valid
is 1, and the id
and hartid
remain unchanged.
A CPU is allowed to retract its issue request transaction before it is accepted with issue_ready
= 1 and it can do so in the following ways:
Set
issue_valid
= 0.Keep
issue_valid
= 1, but change theid
orhartid
signal (and if desired change the other signals inissue_req
).
The instr
, hartid
, and id
signals are valid when issue_valid
is 1.
The instr
signal remains stable during an issue request transaction.
Table 12 describes the x_issue_resp_t
type.
Signal |
Type |
Description |
---|---|---|
|
logic |
Is the offloaded instruction ( |
|
Will the coprocessor perform a write-back in the CPU to |
|
|
Will the coprocessor perform require specific registers to be read? A coprocessor may only request an odd register of a pair, if it also requests the even register of a pair. |
The CPU shall attempt to offload instructions via the issue interface for the following two main scenarios:
The instruction is originally non-compressed and it is not recognized as a valid instruction by the CPU’s non-compressed instruction decoder.
The instruction is originally compressed and the coprocessor accepted the compressed instruction and provided a 32-bit uncompressed instruction. In this case the 32-bit uncompressed instruction will be attempted for offload even if it matches in the CPU’s non-compressed instruction decoder.
Apart from the above two main scenarios a CPU may also attempt to offload (compressed/uncompressed) instructions that it does recognize as legal instructions itself. In case that both the CPU and the coprocessor accept the same instruction as being valid, the instruction will cause an illegal instruction fault upon execution.
In all cases, the CPU must decode the instruction. The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:
is considered to be valid by the CPU and accepted by the coprocessor (
accept
= 1).is considered neither to be valid by the CPU nor accepted by the coprocessor (
accept
= 0).
A coprocessor can delay accept accepting an instruction via issue_ready
in the presence of structural hazards that would prevent execution.
A coprocessor can (only) accept an offloaded instruction when it can handle the instruction (based on decoding instr
).
A transaction is considered offloaded/accepted on the positive edge of clk
when issue_valid
, issue_ready
are asserted and accept
is 1.
A transaction is considered not offloaded/rejected on the positive edge of clk
when issue_valid
and issue_ready
are asserted while accept
is 0.
The signals in issue_resp
are valid when issue_valid
and issue_ready
are both 1. There are no stability requirements.
Register interface
Table 15 describes the register interface signals.
Signal |
Type |
Direction (CPU) |
Description |
---|---|---|---|
|
logic |
output |
Register request valid. Indicates that CPU provides register contents related to an instruction. |
|
logic |
input |
Register request ready. The transaction signaled via |
|
x_register_t |
output |
Register packet. |
Table 16 describes the x_register_t
type.
Signal |
Type |
Description |
---|---|---|
|
Identification of the hart offloading the instruction. |
|
|
Identification of the offloaded instruction. |
|
|
logic [X_RFR_WIDTH-1:0] |
Register file source operands for the offloaded instruction. |
|
Validity of the register file source operand(s). If register pairs are supported, the validity is signaled for each register within the pair individually. |
There are two main scenarios, in how the register interface will be used. They are selected by X_ISSUE_REGISTER_SPLIT
:
X_ISSUE_REGISTER_SPLIT
= 0: A register transaction can be started in the same clock cycle as the issue transaction (issue_valid = register_valid
,issue_ready = register_ready
,issue_req.hartid = register.hartid
andissue_req.id = register.id
). In this case, the CPU will speculatively provide all possible source registers viaregister.rs
when they become available (signalled via the respectivers_valid
signals). The coprocessor will delay accepting the instruction until all necessary registers are provided, and only then assertissue_ready
andregister_ready
. Thers_valid
bits are not required to be stable during the transaction. Each bit can transition from 0 to 1, but is not allowed to transition back to 0 during a transaction. A coprocessor is not expected to wait for allrs_valid
bits to be 1, but only for those registers it intends to read. Thers
signals are only required to be stable during the part of a transaction in which these signals are considered to be valid.
X_ISSUE_REGISTER_SPLIT
= 1: For a CPU which splits the issue and register interface into subsequent pipeline stages (e.g. because it has a dedicated read registers (RR) stage), the registers will be provided after the issue transaction completed. The CPU initiates the register transaction once all registers are available. If the coprocessor is able to accept multiple issue transactions before receiving the registers, the register transaction can occur in a different order. This allows the CPU to reorder instructions based on the availability of operands. The coprocessor is always expected to be ready to retrieve its operands via the register interface after accepting the issue of an instruction. Therefore,register_ready
is tied to 1. Theregister_valid
signal will be 1 for one cycle, andrs_valid
is guaranteed to be equal to the correspondingissue_resp.register_read
. Thus, a coprocessor can ignorers_valid
in this case and a CPU may chose to not implement the signal.
In both scenarios, the following applies:
A register transaction is defined as the combination of all register
signals during which register_valid
is 1, and the id
and hartid
remain unchanged.
A CPU is allowed to retract its register transaction before it is accepted with register_ready
= 1 and it can do so in the following ways:
Set
register_valid
= 0.Keep
register_valid
= 1, but change theid
orhartid
signal (and if desired change the other signals inregister
).
The hartid
, id
, and rs_valid
signals are valid when register_valid
is 1.
The rs
signal is only considered valid when register_valid
is 1 and the corresponding bit in rs_valid
is 1 as well.
The rs[X_NUM_RS-1:0]
signals provide the register file operand(s) to the coprocessor. In case that XLEN
= X_RFR_WIDTH
, then the regular register file
operands corresponding to rs1
, rs2
or rs3
are provided. In case XLEN
!= X_RFR_WIDTH
(i.e. XLEN
= 32 and X_RFR_WIDTH
= 64), then the
rs[X_NUM_RS-1:0]
signals provide two 32-bit register file operands per index (corresponding to even/odd register pairs) with the even register specified
in rs1
, rs2
or rs3
. The register file operand for the even register file index is provided in the lower 32 bits; the register file operand for the
odd register file index is provided in the upper 32 bits. When reading from the x0
, x1
pair, then a value of 0 is returned for the entire operand.
The X_DUALREAD
parameter defines whether dual read is supported and for which register file sources it is supported.
Commit interface
Table 18 describes the commit interface signals.
Signal |
Type |
Direction (CPU) |
Description |
---|---|---|---|
|
logic |
output |
Commit request valid. Indicates that CPU has valid commit or kill information for an offloaded instruction.
There is no corresponding ready signal (it is implicit and assumed 1). The coprocessor shall be ready
to observe the |
|
x_commit_t |
output |
Commit packet. |
Table 19 describes the x_commit_t
type.
Signal |
Type |
Description |
---|---|---|
|
Identification of the hart offloading the instruction. |
|
|
Identification of the offloaded instruction. Valid when |
|
|
logic |
If |
The commit_valid
signal will be 1 exactly one clk
cycle.
It is not required that a commit transaction is performed for each offloaded instruction individually.
Instructions can be signalled to be non-speculative or to be killed in batch.
E.g. signalling the oldest instruction to be killed is equivalent to requesting a flush of the coprocessor.
The first instruction to be considered not-to-be-killed after a commit transaction with commit_kill
as 1,
is at earliest an instruction with successful issue transaction starting at least one clock cycle later.
Note
If an instruction is marked in the coprocessor as killed or committed, the coprocessor shall ignore any subsequent commit transaction related to that instruction.
Note
A coprocessor must be tolerant to any possible commit.id
, whether this represents and in-flight instruction or not.
In this case, the coprocessor may still need to process the request by considering the relevant instructions (either preceding or succeeding) as no longer speculative or to be killed.
This behavior supports scenarios in which more than one coprocessor is connected to an issue interface.
A CPU is required to mark every instruction that has completed the issue transaction as either killed or non-speculative.
This includes accepted (issue_resp.accept
= 1) and rejected instructions (issue_resp.accept
= 0).
A coprocessor does not have to wait for commit_valid
to
become asserted. It can speculate that an offloaded accepted instruction will not get killed, but in case this speculation turns out to be wrong because the instruction actually did get killed,
then the coprocessor must undo any of its internal architectural state changes that are due to the killed instruction.
A coprocessor is not allowed to perform speculative result transactions and shall therefore never initiate a result transaction for instructions that have not yet received a commit transaction
with commit_kill
= 0. The earliest point at which a coprocessor can initiate a result handshake for an instruction is therefore the cycle in which commit_valid
= 1 and commit_kill
= 0
for that instruction.
The signals in commit
are valid when commit_valid
is 1.
Memory (request/response) interface
The memory (request/response) interface is not included in this version of the specification
Memory result interface
The memory (request/response) interface is not included in this version of the specification
Result interface
Table 25 describes the result interface signals.
Signal |
Type |
Direction (CPU) |
Description |
---|---|---|---|
|
logic |
input |
Result request valid. Indicates that the coprocessor has a valid result (write data or exception) for an offloaded instruction. |
|
logic |
output |
Result request ready. The result signaled via |
|
x_result_t |
input |
Result packet. |
The coprocessor shall provide results to the CPU via the result interface. A coprocessor is allowed to provide results to the CPU in an out of order fashion. A coprocessor is only allowed to provide a result for an instruction once the CPU has indicated (via the commit interface) that this instruction is allowed to be committed. Each accepted offloaded (committed and not killed) instruction shall have exactly one result transaction (even if no data needs to be written back to the CPU’s register file). No result transaction shall be performed for instructions which have not been accepted for offload or for instructions that have been killed.
Table 26 describes the x_result_t
type.
Signal |
Type |
Description |
---|---|---|
|
Identification of the hart offloading the instruction. |
|
|
Identification of the offloaded instruction. |
|
|
logic [X_RFW_WIDTH-1:0] |
Register file write data value(s). |
|
logic [4:0] |
Register file destination address(es). |
|
Register file write enable(s). |
A result transaction starts in the cycle that result_valid
= 1 and ends in the cycle that both result_valid
= 1 and result_ready
= 1. The signals in result
are
valid when result_valid
is 1. The signals in result
shall remain stable during a result transaction.
we
is 2 bits wide when XLEN
= 32 and X_RFW_WIDTH
= 64, and 1 bit wide otherwise. The CPU shall ignore write-back to x0
.
When a dual write-back is performed to the x0
, x1
pair, the entire write shall be ignored, i.e. neither x0
nor x1
shall be written by the CPU.
For an instruction instance, the we
signal must be the same as issue_resp.write-back
.
The CPU is not required to check that these signals match.
Interface dependencies
The following rules apply to the relative ordering of the interface handshakes:
The compressed interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload compressed instructions that it does not consider to be valid itself.
The issue interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload instructions that it does not consider to be valid itself.
Every issue interface transaction has an associated register interface transaction, if the instruction is not killed before the register transaction. It is not required for register transactions to be in the same order as the issue transactions.
A register interface transaction cannot be initiated before the corresponding issue interface handshake is initiated.
If
X_ISSUE_REGISTER_SPLIT
= 0, it must be initiated a the same time.If
X_ISSUE_REGISTER_SPLIT
= 1, it can only be initiated after the corresponding issue interface handshake is completed.
Every issue interface transaction (whether accepted or not) must be marked as non-speculative or to be killed by a commit interface transaction.
If an offloaded instruction is accepted and allowed to commit, then for each such instruction one result transaction must occur via the result interface (even if no write-back needs to happen to the CPU’s register file). The transaction ordering on the result interface does not have to correspond to the transaction ordering on the issue interface.
A commit interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
Note
There is no required ordering between commit and register in case of X_ISSUE_REGISTER_SPLIT
= 1.
In this case, implementations must be tolerant to commit before register and register before commit transaction.
A result interface handshake cannot be initiated before the corresponding register interface handshake is initiated. It is allowed to be initiated at the same time or later.
A result interface handshake cannot be initiated before the corresponding instruction has been marked as non-speculative by a commit transaction. It is allowed to be initiated at the same time or later.
A result interface handshake cannot be (or have been) initiated for killed instructions.
Handshake rules
The following handshake pairs exist on the eXtension interface:
compressed_valid
withcompressed_ready
.issue_valid
withissue_ready
.register_valid
withregister_ready
.commit_valid
with implicit always ready signal.
result_valid
withresult_ready
.
The only rule related to *_valid
and *_ready
signals is that:
A transaction is considered accepted on the positive
clk
edge when both valid and (implicit or explicit) ready are 1.
Note
The
*_valid
signals are allowed to be retracted by a CPU (e.g. in case that the related instruction is killed in the CPU’s pipeline before the corresponding*_ready
is signaled).It is defined per interface, if and how the CPU can start a new transaction while a transaction is ongoing (
*_valid
= 1). In most interfaces, it can be started by changing thehartid
and/orid
signal and keeping the*_valid
signal asserted (thereby possibly terminating a previous transaction before it completed).The
*_valid
signals are not allowed to be retracted by a coprocessor (e.g. onceresult_valid
is asserted it must remain asserted until the handshake withresult_ready
has been performed). A new transaction can therefore not be started by a coprocessor by just changing thehartid
and/orid
signal and keeping the valid signal asserted if no*_ready
has been received yet for the original transaction. The cycle after receiving the*_ready
signal, a next (back-to-back) transaction is allowed to be started by just keeping the*_valid
signal high and changing thehartid
and/orid
to that of the next transaction.The
*_ready
signals are allowed to be 1 when the corresponding*_valid
signal is 0.The
*_valid
signals are allowed to transition from 0 to 1 independent of the*_ready
signals’ states.
Signal dependencies
A CPU shall not have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals, except for the following allowed paths:
paths from
result_valid
,result
toregister_valid
,rs
,rs_valid
.
Note
The above implies that the non-compressed instruction instr[31:0]
received via the compressed interface is not allowed
to combinatorially feed into the issue interface’s instr[31:0]
instruction.
A coprocessor is allowed (and expected) to have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals. In order to prevent combinatorial loops the following combinatorial paths are not allowed in a coprocessor:
paths from
register_valid
,rs
,rs_valid
toresult_valid
,result
.
Note
The above implies that a coprocessor has a pipeline stage separating the register file operands from its result generating circuit (similar to the separation between decode stage and execute stage found in many CPUs).
Note
As a CPU is allowed to retract transactions on its compressed, issue, and register interfaces, the compressed_ready
, issue_ready
, and register_ready
signals will have to
depend on signals received from the CPU in a combinatorial manner (otherwise these ready signals might be signaled for the wrong hartid
and id
).
System level deadlock avoidance
In order to avoid system level deadlock both the CPU and the coprocessor shall obey the following rules:
The
valid
signal of a transaction shall not be dependent on the correspondingready
signal.The only allowed dependencies between interfaces for transactions with the same
hartid
andid
are:Issue may depend on Compressed (e.g.
issue_req.instr
depends oncompressed_resp.instr
)Register may depend on Issue (e.g.
register.rs
may depend onissue_resp.register_read
) and CompressedCommit may depend on Issue and Compressed
Result may depend on Commit, Register (e.g.
result.data
may depend onregister.rs
), Issue (e.g.result.we
depends onissue_resp.writeback
), and Compressed
Note
In case of X_ISSUE_REGISTER_SPLIT
= 0, the issue and register interfaces are coupled.
Because commit depends on issue, it is implied that register also cannot depend on commit.
Transactions with an earlier issued
hartid
andid
shall not depend on transactions with a later issuedhartid
andid
(e.g. a coprocessor is not allowed to delay generatingresult_valid
= 1 because it first wants to seecommit_valid
= 1 for a newer instruction).
Note
The use of the words depend and dependent relate to logical relationships, which is broader than combinatorial relationships.
Appendix
This appendix contains several useful, non-normative pieces of information that help implementing the eXtension Interface.
SystemVerilog example
In the src
folder of this project, the file https://github.com/openhwgroup/core-v-xif/blob/main/src/core_v_xif.sv contains a non-normative realization of this specification based on SystemVerilog interfaces.
Of course the use of SystemVerilog (interfaces) is not mandatory.
Coprocessor recommendations
A coprocessor is recommended (but not required) to follow the following suggestions to maximize its re-use potential:
Avoid using opcodes that are reserved or already used by RISC-V International unless for supporting a standard RISC-V extension.
Make it easy to change opcode assignments such that a coprocessor can easily be updated if it conflicts with another coprocessor.
Clearly document the supported and required parameter values.
Timing recommendations
The integration of the eXtension interface will vary from CPU to CPU, and thus require its own set of timing constraints.
CV32E40X eXtension timing budget shows the recommended timing budgets
for the coprocessor and (optional) interconnect for the case in which a coprocessor is paired with the CV32E40X (https://github.com/openhwgroup/cv32e40x) processor.
As is shown in that timing budget, the coprocessor only receives a small part of the timing budget on the paths through xif_issue_if.issue_req.rs*
.
This enables the coprocessor to source its operands directly from the CV32E40X register file bypass network, thereby preventing stall cycles in case an
offloaded instruction depends on the result of a preceding non-offloaded instruction. This implies that, if a coprocessor is intended for pairing with the CV32E40X,
it will be beneficial timing wise if the coprocessor does not directly operate on the rs*
source inputs, but registers them instead. To maximize utilization of a coprocessor with various CPUs, such registers could be made optional via a parameter.
Verification
A UVM agent for the interface was developed for the verification of CVA6. It can be accessed under https://github.com/openhwgroup/core-v-verif/tree/master/lib/uvm_agents/uvma_cvxif.