Load-Store-Unit (LSU)
The Load-Store Unit (LSU) of the core takes care of accessing the data memory. Load and stores on words (32 bit), half words (16 bit) and bytes (8 bit) are supported. The CV32E40P data interface can cause up to 2 outstanding transactions and there is no FIFO to allow more outstanding requests.
Table 65 describes the signals that are used by the LSU.
Signal |
Direction |
Description |
|
output |
Address |
|
output |
Request valid, will stay high until |
|
input |
The other side accepted the request. |
|
output |
Write Enable, high for writes, low for reads. Sent together with |
|
output |
Byte Enable. Is set for the bytes to write/read, sent together with |
|
output |
Data to be written to memory, sent together with |
|
input |
|
|
input |
Data read from memory |
Misaligned Accesses
The LSU never raises address-misaligned exceptions. For loads and stores where the effective address is not naturally aligned to the referenced datatype (i.e., on a four-byte boundary for word accesses, and a two-byte boundary for halfword accesses) the load/store is performed as two bus transactions in case that the data item crosses a word boundary. A single load/store instruction is therefore performed as two bus transactions for the following scenarios:
Load/store of a word for a non-word-aligned address
Load/store of a halfword crossing a word address boundary
In both cases the transfer corresponding to the lowest address is performed first. All other scenarios can be handled with a single bus transaction.
Protocol
The CV32E40P data interface does not implement the following optional OBI signals: auser, wuser, aid, rready, err, ruser, rid. These signals can be thought of as being tied off as specified in the OBI specification.
Note
Transactions Ordering As mentioned above, data interface can generate up to 2 outstanding transactions. OBI specification states that links are always in-order from master point of view. So as the data interface does not generate transaction id (aid), interconnect infrastructure should ensure that transaction responses come back in the same order they were sent by adding its own additional information.
The OBI protocol that is used by the LSU to communicate with a memory works as follows.
The LSU provides a valid address on data_addr_o
, control information
on data_we_o
, data_be_o
(as well as write data on data_wdata_o
in
case of a store) and sets data_req_o
high. The memory sets data_gnt_i
high as soon as it is ready to serve the request. This may happen at any
time, even before the request was sent. After a request has been granted
the address phase signals (data_addr_o
, data_we_o
, data_be_o
and
data_wdata_o
) may be changed in the next cycle by the LSU as the memory
is assumed to already have processed and stored that information. After
granting a request, the memory answers with a data_rvalid_i
set high
if data_rdata_i
is valid. This may happen one or more cycles after the
request has been granted. Note that data_rvalid_i
must also be set high
to signal the end of the response phase for a write transaction (although
the data_rdata_i
has no meaning in that case). When multiple granted requests
are outstanding, it is assumed that the memory requests will be kept in-order and
one data_rvalid_i
will be signalled for each of them, in the order they were issued.
Figure 8, Figure 9, Figure 10 and Figure 11 show example timing diagrams of the protocol.
Post-Incrementing Load and Store Instructions
This section is only valid if COREV_PULP = 1
Post-incrementing load and store instructions perform a load/store operation from/to the data memory while at the same time increasing the base address by the specified offset. For the memory access, the base address without offset is used.
Post-incrementing load and stores reduce the number of required instructions to execute code with regular data access patterns, which can typically be found in loops. These post-incrementing load/store instructions allow the address increment to be embedded in the memory access instructions and get rid of separate instructions to handle pointers. Coupled with hardware loop extension, these instructions allow to reduce the loop overhead significantly.