eXtension Interface
The eXtension interface enables extending CPU with (custom or standardized) instructions without the need to change the RTL of CPU itself. Extensions can be provided in separate modules external to CPU and are integrated at system level by connecting them to the eXtension interface.
The eXtension interface provides low latency (tightly integrated) read and write access to the CPU register file. All opcodes which are not used (i.e. considered to be invalid) by CPU can be used for extensions. It is recommended however that custom instructions do not use opcodes that are reserved/used by RISC-V International.
The eXtension interface enables extension of CPU with:
Custom ALU type instructions.
Custom load/store type instructions.
Custom CSRs and related instructions.
Control-Tranfer type instructions (e.g. branches and jumps) are not supported via the eXtension interface.
CORE-V-XIF
The terminology eXtension interface
and CORE-V-XIF
are used interchangeably.
Parameters
The CORE-V-XIF specification contains the following parameters:
Name |
Type/Range |
Default |
Description |
---|---|---|---|
|
int (2..3) |
2 |
Number of register file read ports that can be used by the eXtension interface. |
|
int (3..32) |
4 |
Identification ( |
|
int (32, 64, 128, 256) |
32 |
Memory access width for loads/stores via the eXtension interface. |
|
int (32, 64) |
32 |
Register file read access width for the eXtension interface. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32P). If XLEN = 64, then the legal value is (only) 64. |
|
int (32, 64) |
32 |
Register file write access width for the eXtension interface. Must be at least XLEN. If XLEN = 32, then the legal values are 32 and 64 (e.g. for RV32D). If XLEN = 64, then the legal value is (only) 64. |
|
logic [31:0] |
0x0000_0000 |
MISA extensions implemented on the eXtension interface. The CPU determines the legal values for this parameter. |
|
logic [1:0] |
2’b0 |
Initial value for |
|
int (0..3) |
0 |
Is dual read supported? 0: No, 1: Yes, for |
|
int (0..1) |
0 |
Is dual write supported? 0: No, 1: Yes. Legal values are determined by the CPU. |
Note
A CPU shall clearly document which X_MISA
values it can support and there is no requirement that a CPU can support
all possible X_MISA
values. For example, if a CPU only supports machine mode, then it is not reasonable to expect that the
CPU will additionally support user mode by just setting the X_MISA[20]
(U
bit) to 1.
Major features
The major features of CORE-V-XIF are:
Minimal requirements on extension instruction encoding.
If an extension instruction relies on reading from or writing to the core’s general purpose register file, then the standard RISC-V bitfield locations for rs1, rs2, rs3, rd as used for non-compressed instructions ([RISC-V-UNPRIV]) must be used. Bitfields for unused read or write operands can be fully repurposed. Extension instructions can either use the compressed or uncompressed instruction format. For offloading compressed instructions the coprocessor must provide the core with the related non-compressed instructions.
Support for dual writeback instructions (optional, based on
X_DUALWRITE
).CORE-V-XIF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file writebacks. Dual writeback is supported for even-odd register pairs (
Xn
andXn+1
withn <> 0
andXn
extracted from instruction bits[11:7]
.Dual register file writeback is only supported for
XLEN
= 32.Support for dual read instructions (per source operand) (optional, based on
X_DUALREAD
).CORE-V-XIF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file reads. Dual read is supported for even-odd register pairs (
Xn
andXn+1
, withXn
extracted from instruction bits [19:15]`,[24:20]
and[31:27]
(i.e.rs1
,rs2
andrs3
). Dual read can therefore provide up to six 32-bit operands per instruction.Dual register file read is only supported for XLEN = 32.
Support for ternary operations.
CORE-V-XIF optionally supports ISA extensions implementing instructions which use three source operands. Ternary instructions must be encoded in the R4-type instruction format defined by [RISC-V-UNPRIV].
Support for instruction speculation.
CORE-V-XIF indicates whether offloaded instructions are allowed to be commited (or should be killed).
CORE-V-XIF consists of six interfaces:
Compressed interface. Signaling of compressed instruction to be offloaded.
Issue (request/response) interface. Signaling of the uncompressed instruction to be offloaded including its register file based operands.
Commit interface. Signaling of control signals related to whether instructions can be committed or should be killed.
Memory (request/response) interface. Signaling of load/store related signals (i.e. its transaction request signals). This interface is optional.
Memory result interface. Signaling of load/store related signals (i.e. its transaction result signals). This interface is optional.
Result interface. Signaling of the instruction result(s).
Operating principle
CPU will attempt to offload every (compressed or non-compressed) instruction that it does not recognize as a legal instruction itself. In case of a compressed instruction the coprocessor must first provide the core with a matching uncompressed (i.e. 32-bit) instruction using the compressed interface. This non-compressed instruction is then attempted for offload via the issue interface.
Offloading of the (non-compressed, 32-bit) instructions happens via the issue interface.
The external coprocessor can decide to accept or reject the instruction offload. In case of acceptation the coprocessor
will further handle the instruction. In case of rejection the core will raise an illegal instruction exception.
As part of the issue interface transaction the core provides the instruction and required register file operand(s) to the coprocessor. If
an offloaded instruction uses any of the register file sources rs1
, rs2
or rs3
, then these are always encoded in instruction bits [19:15]
,
[24:20]
and [31:27]
respectively. The coprocessor only needs to wait for the register file operands that a specific instruction actually uses.
The coprocessor informs the core whether an accepted offloaded instruction is a load/store, to which register(s) in the register file it will writeback, and
whether the offloaded instruction can potentially cause a synchronous exception. CPU uses this information to reserve the load/store unit, to track
data dependencies between instructions, and to properly deal with exceptions caused by offloaded instructions.
Offloaded instructions are speculative; CPU has not necessarily committed to them yet and might decide to kill them (e.g. because they are in the shadow of a taken branch or because they are flushed due to an exception in an earlier instruction). Via the commit interface the core will inform the coprocessor about whether an offloaded instruction will either need to be killed or whether the core will guarantee that the instruction is no longer speculative and is allowed to be commited.
In case an accepted offloaded instruction is a load or store, then the coprocessor will use the load/store unit(s) in CPU to actually perform the load or store. The coprocessor provides the memory request transaction details (e.g. virtual address, write data, etc.) via the memory request interface and CPU will use its PMP/PMA to check if the load or store is actually allowed, and if so, will use its bus interface(s) to perform the required memory transaction and provide the result (e.g. load data and/or fault status) back to the coprocessor via the memory result interface.
The final result of an accepted offloaded instruction can be written back into the coprocessor itself or into the core’s register file. Either way, the
result interface is used to signal to the core that the instruction has completed. Apart from a possible writeback into the register file, the result
interface transaction is for example used in the core to increment the minstret
CSR, to implement the fence instructions and to judge if instructions
before a WFI
instruction have fully completed (so that sleep mode can be entered if needed).
In short: From a functional perspective it should not matter whether an instruction is handled inside the core or inside a coprocessor. In both cases the instructions need to obey the same instruction dependency rules, memory consistency rules, load/store address checks, fences, etc.
Interfaces
This section describes the six interfaces of CORE-V-XIF. Port directions are described as seen from the perspective of the CPU.
The coprocessor will have opposite pin directions.
Stated signals names are not mandatory, but it is highly recommended to at least include the stated names as part of actual signal names. It is for example allowed to add prefixes and/or postfixes (e.g. x_
prefix or _i
, _o
postfixes) or to use different capitalization. A name mapping should be provided if non obvious renaming is applied.
SystemVerilog example
The description in this specification is based on SystemVerilog interfaces. Of course the use of SystemVerilog (interfaces) is not mandatory.
A CPU using the eXtension interface could have the following interface:
module cpu
(
// eXtension interface
if_xif.cpu_compressed xif_compressed_if,
if_xif.cpu_issue xif_issue_if,
if_xif.cpu_commit xif_commit_if,
if_xif.cpu_mem xif_mem_if,
if_xif.cpu_mem_result xif_mem_result_if,
if_xif.cpu_result xif_result_if,
... // Other ports omitted
);
A full example of a CPU with an eXtension interface is the CV32E40X, which can be found at https://github.com/openhwgroup/cv32e40x.
A coprocessor using the eXtension interface could have the following interface:
module coproc
(
// eXtension interface
if_xif.coproc_compressed xif_compressed_if,
if_xif.coproc_issue xif_issue_if,
if_xif.coproc_commit xif_commit_if,
if_xif.coproc_mem xif_mem_if,
if_xif.coproc_mem_result xif_mem_result_if,
if_xif.coproc_result xif_result_if,
... // Other ports omitted
);
A SystemVerilog interface implementation for CORE-V-XIF could look as follows:
interface if_xif
#(
parameter int X_NUM_RS = 2, // Number of register file read ports that can be used by the eXtension interface
parameter int X_ID_WIDTH = 4, // Identification width for the eXtension interface
parameter int X_MEM_WIDTH = 32, // Maximum memory access width for loads/stores via the eXtension interface
parameter int X_RFR_WIDTH = 32, // Register file read access width for the eXtension interface
parameter int X_RFW_WIDTH = 32, // Register file write access width for the eXtension interface
parameter logic [31:0] X_MISA = '0, // MISA extensions implemented on the eXtension interface
parameter logic [ 1:0] X_ECS_XS = '0, // Default value for ``mstatus.xs``
parameter int X_DUALREAD = 0, // Dual register file read
parameter int X_DUALWRITE = 0 // Dual register file write
);
... // typedefs omitted
// Compressed interface
logic compressed_valid;
logic compressed_ready;
x_compressed_req_t compressed_req;
x_compressed_resp_t compressed_resp;
// Issue interface
logic issue_valid;
logic issue_ready;
x_issue_req_t issue_req;
x_issue_resp_t issue_resp;
// Commit interface
logic commit_valid;
x_commit_t commit;
// Memory (request/response) interface
logic mem_valid;
logic mem_ready;
x_mem_req_t mem_req;
x_mem_resp_t mem_resp;
// Memory result interface
logic mem_result_valid;
x_mem_result_t mem_result;
// Result interface
logic result_valid;
logic result_ready;
x_result_t result;
// Modports
modport cpu_issue (
output issue_valid,
input issue_ready,
output issue_req,
input issue_resp
);
modport coproc_issue (
input issue_valid,
output issue_ready,
input issue_req,
output issue_resp
);
... // Further modports omitted
endinterface : if_xif
A full reference implementation of the SystemVerilog interface can be found at https://github.com/openhwgroup/cv32e40x/blob/master/rtl/if_xif.sv.
Identification
The six interfaces of CORE-V-XIF all use a signal called id
, which serves as a unique identification number for offloaded instructions.
The same id
value shall be used for all transaction packets on all interfaces that logically relate to the same instruction.
An id
value can be reused after an earlier instruction related to the same id
value is no longer consider in-flight.
The id
values for in-flight offloaded instructions are only required to be unique; they are for example not required to be incremental.
id
values can only be introduced by the compressed interface and/or the issue interface.
An id
becomes in-flight via the compressed interface in the first cycle that compressed_valid
is 1 for that id
or
when in the first cycle that issue_valid
is 1 for that id
(only if the same id
was not already in-flight via the
compressed interface).
An id
ends being in-flight when one of the following scenarios apply:
the corresponding compressed request transaction is retracted.
the corresponding compressed request transaction is not accepted.
the corresponding issue request transaction is retracted.
the corresponding issue request transaction is not accepted and the corresponding commit handshake has been performed.
the corresponding commit transaction killed the offloaded instruction and no corresponding memory request transaction and/or corresponding memory result transactions is in progress or still needs to be performed.
the corresponding result transaction has been performed.
Compressed interface
Table 1 describes the compressed interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
output |
Compressed request valid. Request to uncompress a compressed instruction. |
|
logic |
input |
Compressed request ready. The transactions signaled via |
|
x_compressed_req_t |
output |
Compressed request packet. |
|
x_compressed_resp_t |
input |
Compressed response packet. |
Table 2 describes the x_compressed_req_t
type.
Signal |
Type |
Description |
|
logic [15:0] |
Offloaded compressed instruction. |
|
logic [1:0] |
Privilege level (2’b00 = User, 2’b01 = Supervisor, 2’b10 = Reserved, 2’b11 = Machine). |
|
logic [X_ID_WIDTH-1:0] |
Identification number of the offloaded compressed instruction. |
The instr[15:0]
signal is used to signal compressed instructions that are considered illegal by CPU itself. A coprocessor can provide an uncompressed instruction
in response to receiving this.
A compressed request transaction is defined as the combination of all compressed_req
signals during which compressed_valid
is 1 and the id
remains unchanged.
A CPU is allowed to retract its compressed request transaction before it is accepted with compressed_ready
= 1 and it can do so in the following ways:
Set
compressed_valid
= 0.Keep
compressed_valid
= 1, but change theid
signal (and if desired change the other signals incompressed_req
).
The signals in compressed_req
are valid when compressed_valid
is 1. These signals remain stable during a compressed request transaction (if id
changes while compressed_valid
remains 1,
then a new compressed request transaction started).
Table 3 describes the x_compressed_resp_t
type.
Signal |
Type |
Description |
|
logic [31:0] |
Uncompressed instruction. |
|
logic |
Is the offloaded compressed instruction ( |
The signals in compressed_resp
are valid when compressed_valid
and compressed_ready
are both 1. There are no stability requirements.
The CPU will attempt to offload every compressed instruction that it does not recognize as a legal instruction itself. CPU might also attempt to offload compressed instructions that it does recognize as legal instructions itself.
The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:
is considered to be valid by the CPU and accepted by the coprocessor (
accept
= 1).is considered neither to be valid by the CPU nor accepted by the coprocessor (
accept
= 0).
The accept
signal of the compressed interface merely indicates that the coprocessor accepts the compressed instruction as an instruction that it implements and translates into
its uncompressed counterpart.
Typically an accepted transaction over the compressed interface will be followed by a corresponding transaction over the issue interface, but there is no requirement
on the CPU to do so (as the instructions offloaded over the compressed interface and issue interface are allowed to be speculative). Only when an accept
is signaled over the issue interface, then an instruction is considered accepted for offload.
The coprocessor shall not take the mstatus
based extension context status into account when generating the accept
signal on its compressed interface (but it shall take
it into account when generating the accept
signal on its issue interface).
Issue interface
Table 4 describes the issue interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
output |
Issue request valid. Indicates that CPU wants to offload an instruction. |
|
logic |
input |
Issue request ready. The transaction signaled via |
|
x_issue_req_t |
output |
Issue request packet. |
|
x_issue_resp_t |
input |
Issue response packet. |
Table 5 describes the x_issue_req_t
type.
Signal |
Type |
Description |
|
logic [31:0] |
Offloaded instruction. |
|
logic [1:0] |
Privilege level (2’b00 = User, 2’b01 = Supervisor, 2’b10 = Reserved, 2’b11 = Machine). |
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. |
|
logic [X_RFR_WIDTH-1:0] |
Register file source operands for the offloaded instruction. |
|
logic [X_NUM_RS-1:0] |
Validity of the register file source operand(s). |
|
logic [5:0] |
Extension Context Status ({ |
|
logic |
Validity of the Extension Context Status. |
An issue request transaction is defined as the combination of all issue_req
signals during which issue_valid
is 1 and the id
remains unchanged.
A CPU is allowed to retract its issue request transaction before it is accepted with issue_ready
= 1 and it can do so in the following ways:
Set
issue_valid
= 0.Keep
issue_valid
= 1, but change theid
signal (and if desired change the other signals inissue_req
).
The instr
, mode
, id
, ecs
, ecs_valid
and rs_valid
signals are valid when issue_valid
is 1.
The rs
signal is only considered valid when issue_valid
is 1 and the corresponding bit in rs_valid
is 1 as well.
The ecs
signal is only considered valid when issue_valid
is 1 and ecs_valid
is 1 as well.
The instr
and mode
signals remain stable during an issue request transaction. The rs_valid
bits are not required to be stable during the transaction. Each bit
can transition from 0 to 1, but is not allowed to transition back to 0 during a transaction. The rs
signals are only required to be stable during the part
of a transaction in which these signals are considered to be valid. The ecs_valid
bit is not required to be stable during the transaction. It can transition from
0 to 1, but is not allowed to transition back to 0 during a transaction. The ecs
signal is only required to be stable during the part of a transaction in which
this signals is considered to be valid.
The rs[X_NUM_RS-1:0]
signals provide the register file operand(s) to the coprocessor. In case that XLEN
= X_RFR_WIDTH
, then the regular register file
operands corresponding to rs1
, rs2
or rs3
are provided. In case XLEN
!= X_RFR_WIDTH
(i.e. XLEN
= 32 and X_RFR_WIDTH
= 64), then the
rs[X_NUM_RS-1:0]
signals provide two 32-bit register file operands per index (corresponding to even/odd register pairs) with the even register specified
in rs1
, rs2
or rs3
. The register file operand for the even register file index is provided in the lower 32 bits; the register file operand for the
odd register file index is provided in the upper 32 bits. The X_DUALREAD
parameter defines whether dual read is supported and for which register file sources
it is supported.
The ecs
signal provides the Extension Context Status from the mstatus
CSR to the coprocessor.
Table 6 describes the x_issue_resp_t
type.
Signal |
Type |
Description |
|
logic |
Is the offloaded instruction ( |
|
logic |
Will the coprocessor perform a writeback in the core to |
|
logic |
Will the coprocessor perform a dual writeback in the core to |
|
logic [2:0] |
Will the coprocessor require dual reads from |
|
logic |
Is the offloaded instruction a load/store instruction?
A coprocessor must signal |
|
logic |
Will the coprocessor perform a writeback in the core to |
|
logic |
Can the offloaded instruction possibly cause a synchronous exception in the coprocessor itself?
A coprocessor must signal |
The core shall attempt to offload instructions via the issue interface for the following two main scenarios:
The instruction is originally non-compressed and it is not recognized as a valid instruction by the CPU’s non-compressed instruction decoder.
The instruction is originally compressed and the coprocessor accepted the compressed instruction and provided a 32-bit uncompressed instruction. In this case the 32-bit uncompressed instruction will be attempted for offload even if it matches in the CPU’s non-compressed instruction decoder.
Apart from the above two main scenarios a CPU may also attempt to offload (compressed/uncompressed) instructions that it does recognize as legal instructions itself. In case that both the CPU and the coprocessor accept the same instruction as being valid, the instruction will cause an illegal instruction fault upon execution.
The CPU shall cause an illegal instruction fault when attempting to execute (commit) an instruction that:
is considered to be valid by the CPU and accepted by the coprocessor (
accept
= 1).is considered neither to be valid by the CPU nor accepted by the coprocessor (
accept
= 0).
A coprocessor can (only) accept an offloaded instruction when:
It can handle the instruction (based on decoding
instr
).The required source registers are marked valid by the offloading core (
issue_valid
is 1 and required bit(s)rs_valid
are 1).
A transaction is considered offloaded/accepted on the positive edge of clk
when issue_valid
, issue_ready
are asserted and accept
is 1.
A transaction is considered not offloaded/rejected on the positive edge of clk
when issue_valid
and issue_ready
are asserted while accept
is 0.
The signals in issue_resp
are valid when issue_valid
and issue_ready
are both 1. There are no stability requirements.
Commit interface
Table 7 describes the commit interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
output |
Commit request valid. Indicates that CPU has valid commit or kill information for an offloaded instruction.
There is no corresponding ready signal (it is implicit and assumed 1). The coprocessor shall be ready
to observe the |
|
x_commit_t |
output |
Commit packet. |
Note
The CPU shall perform a commit transaction for every issue transaction, independent of the accept
value of the issue transaction. A coprocessor shall ignore the
commit_kill
signal for instructions that it did not accept. A CPU can signal either commit_kill
= 0 or commit_kill
= 1 for non-accepted instructions.
Table 8 describes the x_commit_t
type.
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. Valid when |
|
logic |
Shall an offloaded instruction be killed? If |
The commit_valid
signal will be 1 exactly one clk
cycle for every offloaded instruction by the coprocessor (whether accepted or not). The id
value indicates which offloaded
instruction is allowed to be committed or is supposed to be killed.
For each offloaded and accepted instruction the core is guaranteed to (eventually) signal that such an instruction is either no longer speculative and can be committed (commit_valid
is 1
and commit_kill
is 0) or that the instruction must be killed (commit_valid
is 1 and commit_kill
is 1).
A coprocessor does not have to wait for commit_valid
to
become asserted. It can speculate that an offloaded accepted instruction will not get killed, but in case this speculation turns out to be wrong because the instruction actually did get killed,
then the coprocessor must undo any of its internal architectural state changes that are due to the killed instruction.
A coprocessor is allowed to perform speculative memory request transactions, but then it must be aware that CPU can signal a failure for speculative memory request transactions to
certain memory regions. A coprocessor shall never initiate memory request transactions for instructions that have already been killed at least a clk
cycle earlier. If a memory request
transaction or memory result transaction is already in progress at the time that the CPU signals commit_kill
= 1, then these transaction(s) will complete as normal (although the
information contained within the memory response and memory result shall be ignored by the coprocessor).
A coprocessor is not allowed to perform speculative result transactions and shall therefore never initiate a result transaction for instructions that have not yet received a commit transaction
with commit_kill
= 0. The earliest point at which a coprocessor can initiate a result handshake for an instruction is therefore the cycle in which commit_valid
= 1 and commit_kill
= 0
for that instruction.
The signals in commit
are valid when commit_valid
is 1.
Memory (request/response) interface
Table 9 describes the memory (request/response) interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
input |
Memory (request/response) valid. Indicates that the coprocessor wants to perform a memory transaction for an offloaded instruction. |
|
logic |
output |
Memory (request/response) ready. The memory (request/response) signaled via |
|
x_mem_req_t |
input |
Memory request packet. |
|
x_mem_resp_t |
output |
Memory response packet. Response to memory request (e.g. PMA check response). Note that this is not the memory result. |
Table 10 describes the x_mem_req_t
type.
Signal |
Type |
Description |
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. |
|
logic [31:0] |
Virtual address of the memory transaction. |
|
logic [1:0] |
Privilege level (2’b00 = User, 2’b01 = Supervisor, 2’b10 = Reserved, 2’b11 = Machine). |
|
logic |
Write enable of the memory transaction. |
|
logic [2:0] |
Size of the memory transaction. 0: byte, 1: 2 bytes (halfword), 2: 4 bytes (word), 3: 8 bytes (doubleword), 4: 16 bytes, 5: 32 bytes, 6: Reserved, 7: Reserved. |
|
logic [X_MEM_WIDTH/8-1:0] |
Byte enables for memory transaction. |
|
logic [1:0] |
Memory transaction attributes. attr[0] = modifiable (0 = not modifiable, 1 = modifiable). attr[1] = unaligned (0 = aligned, 1 = unaligned). |
|
logic [X_MEM_WIDTH-1:0] |
Write data of a store memory transaction. |
|
logic |
Is this the last memory transaction for the offloaded instruction? |
|
logic |
Is the memory transaction speculative? |
The memory request interface can be used by the coprocessor to initiate data side memory read or memory write transactions. All memory transactions, no matter if they are initiated by CPU itself or by a coprocessor via the memory request interface, are treated equally. Specifically this equal treatment applies to:
PMA checks and attribution
PMU usage
MMU usage
Misaligned load/store exception handling
Write buffer usage
As for non-offloaded load or store instructions it is assumed that execute permission is never required for offloaded load or store instructions.
If desired a coprocessor can always avoid performing speculative loads or stores (as indicated by spec
= 1)
by waiting for the commit interface to signal that the offloaded instruction is no longer speculative before issuing the memory request.
Whether a load or store is treated as being speculative or not by the CPU shall only depend on the spec
signal. Specifically, the CPU shall
ignore whatever value it might have communicated via commit_kill
with respect to whether it treats a memory request as speculative or not. A coprocessor
is allowed to signal spec
= 1 without taking the commit transaction into account (so for example even after commit_kill
= 0 has already been signaled).
The addr
signal indicates the (byte) start address of the memory transaction. Transactions on the memory (request/response) interface cannot cross a X_MEM_WIDTH (bus width) boundary.
The be
signal indicates on what byte lanes to expect valid data for both read and write transactions. be[n]
determines the validity of data bits 8*N+7:8*N
.
There are no limitations on the allowed be
values.
The size
signal indicates the size of the memory transaction. size
shall reflect a naturally aligned range of byte lanes to be used in a transaction.
The size of a transaction shall not exceed the maximum mememory access width (memory bus width) as determined by X_MEM_WIDTH
.
The addr
signal shall be consistent with the be
signal, i.e. if the maximum memory access width (memory bus width) is 2^N bytes (N=2,3,4,5) and the lowest set bit in
be
is at index IDX, then addr[N-1:0]
shall be at most IDX.
When for example performing a transaction that uses the middle two bytes on a 32-bit wide memory interface, the following (equivalent) be`, size
, addr[1:0]
combinations can be used:
be
= 4’b0110,size
= 3’b010``,addr[1:0]
= 2’b00.be
= 4’b0110,size
= 3’b010``,addr[1:0]
= 2’b01.
Note that a word transfer is needed in this example because the two bytes transfered are not halfword aligned.
Unaligned (i.e. non naturally aligned) transactions are supported over the memory (request/response) interface using the be
signal. Not all unaligned memory operations
can however be performed as single transactions on the memory (request/response) interface. Specifically if an unaligned memory operation crosses a X_MEM_WIDTH boundary, then it shall
be broken into multiple transactions on the memory (request/response) interface by the coprocessor.
The attr
signal indicates the attributes of the memory transaction.
attr[0]
indicates whether the transaction is a modifiable transaction. This bit shall be set if the
transaction results from modifications already done in the coprocessor (e.g. merging, splitting, or using a transaction size larger than strictly needed (without changing the active byte lanes)) or
if the coprocessor allows such modifications of this transaction at the system level. The CPU shall check whether a modifiable transaction to the requested
address is allowed or not (and respond with an appropriate synchronous exception via the memory response interface if needed). An example of a modified transaction is
performing a (merged) word transaction as opposed of doing four byte transactions (assuming the natively intended memory operations are byte operations).
attr[1]
indicates whether the natively intended memory operation(s) resulting in this transaction is naturally aligned or not (0: aligned, 1: unaligned).
In case that an unaligned native memory operation requires multiple memory request interface transactions, then the coprocessor is responsible for splitting the unaligned native memory operation
into multiple transactions on the memory request interface, each of them having both attr[0]
= 1 and attr[0]
= 1.
The CPU shall check whether an unaligned transaction to the requested
address is allowed or not (and respond with an appropriate synchronous exception via the memory response interface if needed).
Note
Even though the coprocessor is allowed, and sometimes even mandated, to split transacations, this does not mean that split transactions will not result in exceptions. Whether a split transaction is allowed (and makes it onto the external CPU bus interface) or will lead to an exception, is determined by the CPU (e.g. by its PMA). No matter if the coprocessor already split a transaction or not, further splitting might be required within the CPU itself (depending on whether a transaction on the memory (request/response) interface can be handled as single transaction on the CPU’s native bus interface or not. In general a CPU is allowed to make any modification to a memory (request/response) interface transaction as long as it is in accordance with the modifiable physical memory attribute for the concerned address region.
A memory request transaction starts in the cycle that mem_valid
= 1 and ends in the cycle that both mem_valid
= 1 and mem_ready
= 1. The signals in mem_req
are
valid when mem_valid
is 1. The signals in mem_req
shall remain stable during a memory request transaction, except that wdata
is only required to remain stable during
memory request transactions in which we
is 1.
A coprocessor may issue multiple memory request transactions for an offloaded accepted load/store instruction. The coprocessor
shall signal last
= 0 if it intends to issue following memory request transaction with the same id
and it shall signal
last
= 1 otherwise. Once a coprocessor signals last
= 1 for a memory request transaction it shall not issue further memory
request transactions for the same id
.
Normally a sequence of memory request transactions ends with a
transaction that has last
= 1. However, if a coprocessor receives exc
= 1 or dbg
= 1 via the memory response interface in response to a non-last memory request transaction,
then it shall issue no further memory request transactions for the same instruction (id
). Similarly, after having received commit_kill` = 1 no further memory request transactions shall
be issued by a coprocessor for the same instruction (id
).
A coprocessor shall never initiate a memory request transaction(s) for offloaded non-accepted instructions.
A coprocessor shall never initiate a memory request transaction(s) for offloaded non-load/store instructions (loadstore
= 0).
A coprocessor shall never initiate a non-speculative memory request transaction(s) unless in the same cycle or after the cycle of receiving a commit transaction with commit_kill
= 0.
A coprocessor shall never initiate a speculative memory request transaction(s) on cycles after a cycle in which it receives commit_kill
= 1 via the commit transaction.
A coprocessor shall initiate memory request transaction(s) for offloaded accepted load/store instructions that receive commit_kill
= 0 via the commit transaction.
A CPU shall always (eventually) complete any memory request transaction by signaling mem_ready
= 1 (also for transactions that relate to killed instructions).
Table 11 describes the x_mem_resp_t
type.
Signal |
Type |
Description |
|
logic |
Did the memory request cause a synchronous exception? |
|
logic [5:0] |
Exception code. |
|
logic |
Did the memory request cause a debug trigger match with |
The exc
is used to signal synchronous exceptions resulting from the memory request transaction defined in mem_req
.
The dbg
is used to signal a debug trigger match with mcontrol.timing
= 0 resulting from the memory request transaction defined in mem_req
.
In case of a synchronous exception or debug trigger match with before timing no corresponding transaction will be performed over the memory result (mem_result_valid
) interface.
A synchronous exception will lead to a trap in CPU unless the corresponding instruction is killed. exccode
provides the least significant bits of the exception
code bitfield of the mcause
CSR. Similarly a debug trigger match with before timing will lead to debug mode entry in CPU unless the corresponding instruction is killed.
A coprocessor shall take care that an instruction that causes exc
= 1 or dbg
= 1 does not cause (coprocessor local) side effects that are prohibited in the context of synchronous
exceptions or debug trigger match with * before* timing. Furthermore, if a result interface handshake will occur for this same instruction, then the exc
, exccode
and dbg
information shall be passed onto that handshake as well. It is the responsibility of the CPU to make sure that (precise) synchronous exception entry and debug entry with before timing
is achieved (possibly by killing following instructions that either are already offloaded or are in its own pipeline). A coprocessor shall not itself use the exc
or dbg
information to
kill following instructions in its pipeline.
The signals in mem_resp
are valid when mem_valid
and mem_ready
are both 1. There are no stability requirements.
If mem_resp
relates to an instruction that has been killed, then the CPU is allowed to signal any value in mem_resp
and the coprocessor shall ignore the value received via mem_resp
.
The memory response and hence the memory request/response handshake may get delayed in case that the CPU splits a memory (request/response) interface transaction into multiple transactions on its native bus interface. Once it is known that the first, or any following, access results in a synchronous exception, the handshake can be performed immediately. Otherwise, the handshake is performed only once it is known that none of the split transactions result in a synchronous exception.
The memory (request/response) interface is optional. If it is included, then the memory result interface shall also be included.
Memory result interface
Table 12 describes the memory result interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
output |
Memory result valid. Indicates that CPU has a valid memory result for the corresponding memory request.
There is no corresponding ready signal (it is implicit and assumed 1). The coprocessor must be ready to accept
|
|
x_mem_result_t |
output |
Memory result packet. |
Table 13 describes the x_mem_result_t
type.
Signal |
Type |
Description |
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. |
|
logic [X_MEM_WIDTH-1:0] |
Read data of a read memory transaction. Only used for reads. |
|
logic |
Did the instruction cause a bus error? |
|
logic |
Did the read data cause a debug trigger match with |
The memory result interface is used to provide a result from CPU to the coprocessor for every memory transaction (i.e. for both read and write transactions). No memory result transaction is performed for instructions that led to a synchronous exception or debug trigger match with before timing as signaled via the memory (request/response) interface. Otherwise, one memory result transaction is performed per memory (request/response) transaction (even for killed instructions).
Memory result transactions are provided by the CPU in the same order (with matching id
) as the memory (request/response) transactions are received. The err
signal
signals whether a bus error occurred. The dbg
signal
signals whether a debug trigger match with before timing occurred rdata
(for a read transaction only).
A coprocessor shall take care that an instruction that causes dbg
= 1 does not cause (coprocessor local) side effects that are prohibited in the context of
debug trigger match with * before* timing. A coprocessor is allowed to treat err
= 1 as an imprecise exception (i.e. it is not mandatory to prevent (coprocessor local)
side effects based on the err
signal).
Furthermore, if a result interface handshake will occur for this same instruction, then the err
and dbg
information shall be passed onto that handshake as well. It is the responsibility of the CPU to make sure that (precise) debug entry with before timing is achieved (possibly by killing following instructions that either are already offloaded or are in its own pipeline). Upon receiving err
= 1 via the result interface handshake the CPU shall signal an (imprecise) NMI.
A coprocessor shall not itself use the err
or dbg
information to kill following instructions in its pipeline.
If mem_result
relates to an instruction that has been killed, then the CPU is allowed to signal any value in mem_result
and the coprocessor shall ignore the value received via mem_result
.
From a CPU’s point of view each memory request transaction has an associated memory result transaction (except if a synchronous exception or debug trigger match with before timing is signaled via the memory (request/response) interface). The same is not true for a coprocessor as it can receive memory result transactions for instructions that it did not accept and for which it did not issue a memory request transaction. Such memory result transactions shall be ignored by a coprocessor. In case that a coprocessor did issue a memory request transaction, then it is guaranteed to receive a corresponding memory result transaction (which it must be ready to accept).
Note
The above asymmetry can only occur at system level when multiple coprocessors are connected to a processor via some interconnect network. CORE-V-XIF
in itself
is a point-to-point connection, but its definition is written with CORE-V-XIF
interconnect network(s) in mind.
The signals in mem_result
are valid when mem_result_valid
is 1.
The memory result interface is optional. If it is included, then the memory (request/response) interface shall also be included.
Result interface
Table 14 describes the result interface signals.
Signal |
Type |
Direction (CPU) |
Description |
|
logic |
input |
Result request valid. Indicates that the coprocessor has a valid result (write data or exception) for an offloaded instruction. |
|
logic |
output |
Result request ready. The result signaled via |
|
x_result_t |
input |
Result packet. |
The coprocessor shall provide results to the core via the result interface. A coprocessor is allowed to provide results to the core in an out of order fashion. A coprocessor is only allowed to provide a result for an instruction once the core has indicated (via the commit interface) that this instruction is allowed to be committed. Each accepted offloaded (committed and not killed) instruction shall have exactly one result transaction (even if no data needs to be written back to the CPU’s register file). No result transaction shall be performed for instructions which have not been accepted for offload or for instructions that have been killed.
Table 15 describes the x_result_t
type.
Signal |
Type |
Description |
|
logic [X_ID_WIDTH-1:0] |
Identification of the offloaded instruction. |
|
logic [X_RFW_WIDTH-1:0] |
Register file write data value(s). |
|
logic [4:0] |
Register file destination address(es). |
|
logic [X_RFW_WIDTH/XLEN-1:0] |
Register file write enable(s). |
|
logic [2:0] |
Write enables for |
|
logic [5:0] |
Write data value for { |
|
logic |
Did the instruction cause a synchronous exception? |
|
logic [5:0] |
Exception code. |
|
logic |
Did the instruction cause a debug trigger match with |
|
logic |
Did the instruction cause a bus error? |
A result transaction starts in the cycle that result_valid
= 1 and ends in the cycle that both result_valid
= 1 and result_ready
= 1. The signals in result
are
valid when result_valid
is 1. The signals in result
shall remain stable during a result transaction, except that data
is only required to remain stable during
result transactions in which we
is not 0.
The exc
is used to signal synchronous exceptions.
A synchronous exception shall lead to a trap in the CPU (unless dbg
= 1 at the same time). exccode
provides the least significant bits of the exception
code bitfield of the mcause
CSR. we
shall be driven to 0 by the coprocessor for synchronous exceptions.
The CPU shall kill potentially already offloaded instructions to guarantee precise exception behavior.
The err
is used to signal a bus error.
A bus error shall lead to an (imprecise) NMI in the CPU.
The dbg
is used to signal a debug trigger match with mcontrol.timing
= 0. This signal is only used to signal debug trigger matches received earlier via
a corresponding memory (request/response) transaction or memory request transaction.
The trigger match shall lead to a debug entry in the CPU.
The CPU shall kill potentially already offloaded instructions to guarantee precise debug entry behavior.
we
is 2 bits wide when XLEN
= 32 and X_RFW_WIDTH
= 64, and 1 bit wide otherwise. If we
is 2 bits wide, then we[1]
is only allowed to be 1 if we[0]
is 1 as well (i.e. for
dual writeback).
If ecswe[2]` is 1, then the value in ecsdata[5:4]
is written to mstatus.xs
.
If ecswe[1]` is 1, then the value in ecsdata[3:2]
is written to mstatus.fs
.
If ecswe[0]` is 1, then the value in ecsdata[1:0]
is written to mstatus.vs
.
The writes to the stated mstatus
bitfields will take into account any WARL rules that might exist for these bitfields in the CPU.
The signals in result
are valid when result_valid
is 1. These signals remain stable during a result transaction.
Interface dependencies
The following rules apply to the relative ordering of the interface handshakes:
The compressed interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload instructions that it does not consider to be valid itself.
The issue interface transactions are in program order (possibly a subset) and the CPU will at least attempt to offload instructions that it does not consider to be valid itself.
Every issue interface transaction (whether accepted or not) has an associated commit interface transaction and both interfaces use a matching transaction ordering.
If an offloaded instruction is accepted as a
loadstore
instruction and not killed, then for each such instruction one or more memory transaction must occur via the memory interface. The transaction ordering on the memory interface interface must correspond to the transaction ordering on the issue interface.If an offloaded instruction is accepted and allowed to commit, then for each such instruction one result transaction must occur via the result interface (even if no writeback needs to happen to the core’s register file). The transaction ordering on the result interface does not have to correspond to the transaction ordering on the issue interface.
A commit interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
A memory (request/response) interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
Memory result interface transactions cannot be initiated before the corresponding memory request interface handshake is completed. They are allowed to be initiated at the same time as or after completion of the memory request interface handshake. Note that a coprocessor shall be able to tolerate memory result transactions for which it did not perform the corresponding memory request handshake itself.
A result interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
A result interface handshake cannot be initiated before the corresponding commit interface handshake is initiated (and the instruction is allowed to commit). It is allowed to be initiated at the same time or later.
A memory (request/response) interface handshake cannot be initiated for instructions that were killed in an earlier cycle.
A memory result interface handshake shall occur for every memory (request/response) interface handshake unless the response has
exc
= 1 ordbg
= 1.A result interface handshake cannot be (or have been) initiated for killed instructions.
Handshake rules
The following handshake pairs exist on the eXtension interface:
compressed_valid
withcompressed_ready
.issue_valid
withissue_ready
.commit_valid
with implicit always ready signal.mem_valid
withmem_ready
.mem_result_valid
with implicit always ready signal.result_valid
withresult_ready
.
The only rule related to valid and ready signals is that:
A transaction is considered accepted on the positive
clk
edge when both valid and (implicit or explicit) ready are 1.
Specifically note the following:
The valid signals are allowed to be retracted by a CPU (e.g. in case that the related instruction is killed in the CPU’s pipeline before the corresponding ready is signaled).
A new transaction can be started by a CPU by changing the
id
signal and keeping the valid signal asserted (thereby possibly terminating a previous transaction before it completed).The valid signals are not allowed to be retracted by a coprocessor (e.g. once
mem_valid
is asserted it must remain asserted until the handshake withmem_ready
has been performed). A new transaction can therefore not be started by a coprocessor by just changing theid
signal and keeping the valid signal asserted if no ready has been received yet for the original transaction. The cycle after receiving the ready signal, a next (back-to-back) transaction is allowed to be started by just keeping the valid signal high and changing theid
to that of the next transaction.The ready signals is allowed to be 1 when the corresponding valid signal is not asserted.
Signal dependencies
A CPU shall not have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals, except for the following allowed paths:
paths from
result_valid
,result
tors
,rs_valid
.paths from
mem_valid
,mem_req
tomem_ready
,mem_resp
.
Note
The above implies that the non-compressed instruction instr[31:0]
received via the compressed interface is not allowed
to combinatorially feed into the issue interface’s instr[31:0]
instruction.
A coprocessor is allowed (and expected) to have combinatorial paths from its eXtension interface input signals to its eXtension interface output signals. In order to prevent combinatorial loops the following combinatorial paths are not allowed in a coprocessor:
paths from
rs
,rs_valid
toresult_valid
,result
.paths from
mem_ready
,mem_resp
tomem_valid
,mem_req
.
Note
The above implies that a coprocessor has a pipeline stage separating the register file operands from its result generating circuit (similar to the separation between decode stage and execute stage found in many CPUs).
Note
As a CPU is allowed to retract transactions on its compressed and issue interfaces, the compressed_ready
and issue_ready
signals will have to
depend on signals received from the CPU in a combinatorial manner (otherwise these ready signals might be signaled for the wrong id
).
Handshake dependencies
In order to avoid system level deadlock both the CPU and the coprocessor shall obey the following rules:
The
valid
signal of a transaction shall not be dependent on the correspondingready
signal.Transactions related to an earlier part of the instruction flow shall not depend on transactions with the same
id
related to a later part of the instruction flow. The instruction flow is defined from earlier to later as follows: Compressed transaction, issue transaction, commit transaction, memory (request/response) transaction, memory result transaction, result transaction.Transactions with an earlier issued
id
shall not depend on transactions with a later issuedid
(e.g. a coprocessor is not allowed to delay generatingissue_ready
= 1
because it first wants to see result_ready
= 1 for an older instruction).
Note
The use of the words depend and dependent relate to logical relationships, which is broader than combinatorial relationships.
CPU recommendations
Coprocessor recommendations
A coprocessor is recommended (but not required) to follow the following suggestions to maximize its re-use potential:
Avoid using opcodes that are reserved or already used by RISC-V International unless for supporting a standard RISC-V extension.
Make it easy to change opcode assignments such that a coprocessor can easily be updated if it conflicts with another coprocessor.
Clearly document the supported and required parameter values.
Clearly document the supported and required interfaces (the memory (request/response) interface and memory result interface are optional).
Timing recommendations
The integration of the eXtension interface will vary from CPU to CPU, and thus require its own set of timing constraints.
CV32E40X eXtension timing budget shows the recommended timing budgets
for the coprocessor and (optional) interconnect for the case in which a coprocessor is paired with the CV32E40X (https://github.com/openhwgroup/cv32e40x) processor.
As is shown in that timing budget, the coprocessor only receives a small part of the timing budget on the paths through xif_issue_if.issue_req.rs*
.
This enables the coprocessor to source its operands directly from the CV32E40X register file bypass network, thereby preventing stall cycles in case an
offloaded instruction depends on the result of a preceding non-offloaded instruction. This implies that, if a coprocessor is intended for pairing with the CV32E40X,
it will be beneficial timing wise if the coprocessor does not directly operate on the rs*
source inputs, but registers them instead. To maximize utilization of a coprocessor with various CPUs, such registers could be made optional via a parameter.