Performance Counters
Ibex implements performance counters according to the RISC-V Privileged Specification, version 1.11 (see Hardware Performance Monitor, Section 3.1.11).
The performance counters are placed inside the Control and Status Registers (CSRs) and can be accessed with the CSRRW(I)
and CSRRS/C(I)
instructions.
Ibex implements the clock cycle counter mcycle(h)
, the retired instruction counter minstret(h)
, as well as the 29 event counters mhpmcounter3(h)
- mhpmcounter31(h)
and the corresponding event selector CSRs mhpmevent3
- mhpmevent31
, and the mcountinhibit
CSR to individually enable/disable the counters.
mcycle(h)
and minstret(h)
are always available and 64 bit wide.
The mhpmcounter
performance counters are optional (unavailable by default) and parametrizable in width.
Event Selector
The following events can be monitored using the performance counters of Ibex.
Event ID/Bit |
Event Name |
Event Description |
---|---|---|
0 |
NumCycles |
Number of cycles |
2 |
NumInstrRet |
Number of instructions retired |
3 |
NumCyclesLSU |
Number of cycles waiting for data memory |
4 |
NumCyclesIF |
Cycles waiting for instruction fetches, i.e., number of instructions wasted due to non-ideal caching |
5 |
NumLoads |
Number of data memory loads. Misaligned accesses are counted as two accesses |
6 |
NumStores |
Number of data memory stores. Misaligned accesses are counted as two accesses |
7 |
NumJumps |
Number of unconditional jumps (j, jal, jr, jalr) |
8 |
NumBranches |
Number of branches (conditional) |
9 |
NumBranchesTaken |
Number of taken branches (conditional) |
10 |
NumInstrRetC |
Number of compressed instructions retired |
11 |
NumCyclesWFI |
Cycles waiting in WFI instruction |
12 |
NumCyclesDivWait |
Cycles waiting for divide to complete |
The event selector CSRs mhpmevent3
- mhpmevent31
define which of these events are counted by the event counters mhpmcounter3(h)
- mhpmcounter31(h)
.
If a specific bit in an event selector CSR is set to 1, this means that events with this ID are being counted by the counter associated with that selector CSR.
If an event selector CSR is 0, this means that the corresponding counter is not counting any event.
Controlling the counters from software
By default, all available counters are enabled after reset.
They can be individually enabled/disabled by overwriting the corresponding bit in the mcountinhibit
CSR at address 0x320
as described in the RISC-V Privileged Specification, version 1.11 (see Machine Counter-Inhibit CSR, Section 3.1.13).
In particular, to enable/disable mcycle(h)
, bit 0 must be written. For minstret(h)
, it is bit 2. For event counter mhpmcounterX(h)
, it is bit X.
The lower 32 bits of all counters can be accessed through the base register, whereas the upper 32 bits are accessed through the h
-register.
Reads to all these registers are non-destructive.
Parametrization at synthesis time
The mcycle(h)
and minstret(h)
counters are always available and 64 bit wide.
The event counters mhpmcounter3(h)
- mhpmcounter31(h)
are parametrizable.
Their width can be parametrized between 1 and 64 bit through the WidthMHPMCounters
parameter, which defaults to 40 bit wide counters.
The number of available event counters mhpmcounterX(h)
can be controlled via the NumMHPMCounters
parameter.
By default (NumMHPMCounters
set to 0), no counters are available to software.
Set NumMHPMCounters
to a value between 1 and 8 to make the counters mhpmcounter3(h)
- mhpmcounter10(h)
available as listed below.
Setting NumMHPMCounters
to values larger than 8 does not result in any more performance counters.
Unavailable counters always read 0.
The association of events with the mphmcounter
registers is hardwired as listed in the following table.
Event Counter |
CSR Address |
Event ID/Bit |
Event Name |
---|---|---|---|
|
0xB00 (0xB80) |
0 |
NumCycles |
|
0xB02 (0xB82) |
2 |
NumInstrRet |
|
0xB03 (0xB83) |
3 |
NumCyclesLSU |
|
0xB04 (0xB84) |
4 |
NumCyclesIF |
|
0xB05 (0xB85) |
5 |
NumLoads |
|
0xB06 (0xB86) |
6 |
NumStores |
|
0xB07 (0xB87) |
7 |
NumJumps |
|
0xB08 (0xB88) |
8 |
NumBranches |
|
0xB09 (0xB89) |
9 |
NumBranchesTaken |
|
0xB0A (0xB8A) |
10 |
NumInstrRetC |
|
0xB0B (0xB8B) |
11 |
NumCyclesWFI |
|
0xB0C (0xB8C) |
12 |
NumCyclesDivWait |
Similarly, the event selector CSRs are hardwired as follows. The remaining event selector CSRs are tied to 0, i.e., no events are counted by the corresponding counters.
Event Selector |
CSR Address |
Reset Value |
Event ID/Bit |
---|---|---|---|
|
0x323 |
0x0000_0008 |
3 |
|
0x324 |
0x0000_0010 |
4 |
|
0x325 |
0x0000_0020 |
5 |
|
0x326 |
0x0000_0040 |
6 |
|
0x327 |
0x0000_0080 |
7 |
|
0x328 |
0x0000_0100 |
8 |
|
0x329 |
0x0000_0200 |
9 |
|
0x32A |
0x0000_0400 |
10 |
|
0x32B |
0x0000_0800 |
11 |
|
0x32C |
0x0000_1000 |
12 |
FPGA Targets
For FPGA targets the performance counters constitute a particularily large structure. Implementing the maximum 29 event counters 32, 48 and 64 bit wide results in relative logic utilizations of the core of 100%, 111% and 129% respectively. The relative numbers of flip-flops are 100%, 125% and 150%. It is recommended to implement event counters of 32 bit width where possible.
For Xilinx FPGA devices featuring the DSP48E1 DSP slice or similar, counter logic can be absorbed into the DSP slice for widths up to 48 bits.
The resulting relative logic utilizations with respect to the non-DSP 32 bit counter implementation are 83% and 89% respectively for 32 and 48 bit DSP counters.
This comes at the expense of 1 DSP slice per counter.
For 32 bit counters only, the corresponding flip-flops can be incorporated into the DSP’s output pipeline register, resulting in a reduction of the number of flip-flops to 50%.
In order to infer DSP slices for performance counters, define the preprocessor variable FPGA_XILINX
.