From e3cce76694bb47a20097786647e5ed01d03888fc Mon Sep 17 00:00:00 2001 From: Damien Pretet Date: Wed, 20 Dec 2023 22:07:56 +0100 Subject: [PATCH] Start atomic operation support --- doc/atomic_ops.md | 98 ++++++++++++++++++++++++++++++++++++++++++ doc/axi_id_ordering.md | 71 ++++++++++++++++++++++++++++++ doc/project_mgt_hw.md | 5 +-- 3 files changed, 171 insertions(+), 3 deletions(-) create mode 100644 doc/atomic_ops.md create mode 100644 doc/axi_id_ordering.md diff --git a/doc/atomic_ops.md b/doc/atomic_ops.md new file mode 100644 index 0000000..0467287 --- /dev/null +++ b/doc/atomic_ops.md @@ -0,0 +1,98 @@ +# Atomic Operations Support + +## Overview + +The aim of this dev (made from v1.6.1) is to support atomic operation instructions. Atomic +operations will bring synchronization techniques required by kernels. The goal for FRISCV is to be +able to boot a kernel like FreeRTOS or Linux (without MMU) and makes the core a platform for real +worl usecases. + +From [OS dev wiki](https://wiki.osdev.org/Atomic_operation): + + An atomic operation is an operation that will always be executed without any other process being + able to read or change state that is read or changed during the operation. It is effectively + executed as a single step, and is an important quality in a number of algorithms that deal with + multiple independent processes, both in synchronization and algorithms that update shared data + without requiring synchronization. + +For single core system: + + If an operation requires multiple CPU instructions, then it may be interrupted in + the middle of executing. If this results in a context switch (or if the interrupt handler refers + to data that was being used) then atomicity could be compromised. It is possible to use any + standard locking technique (e.g. a spinlock) to prevent this, but may be inefficient. If it is + possible, disabling interrupts may be the most efficient method of ensuring atomicity (although + note that this may increase the worst-case interrupt latency, which could be problematic if it + becomes too long). + +For multi core system: + + n multiprocessor systems, ensuring atomicity exists is a little harder. It is still possible to + use a lock (e.g. a spinlock) the same as on single processor systems, but merely using a single + instruction or disabling interrupts will not guarantee atomic access. You must also ensure that + no other processor or core in the system attempts to access the data you are working with. + +In summary, an atomic operation can be useful to: +- synchronize threads among a core +- synchronize cores in a SOC +- ensure a memory location can be read-then-update in any situation, including exceptions handling + +Atomic operations will be implemented in a dedicated processing unit and in load/store stage +(`memfy`). Atomic Operation unit (`AMO`) will issue read/write request to load/store stage (`MEMFY`) +with a specific and unique ID. dCache stage will also be updated to better support `ACACHE`, slighlty +change `AID` handling and put in place exclusive access support. + +## Design Plan + +- Document and list all AXI usage and limitations in the IP. + +### AXI Ordering + +The core, `memfy` and `dCache` stages, will be updated on `AID` usage. Please refer +to [AMBA spec](./axi_id_ordering.md) for further details of `AID` usage and ordering model. + + +### Atomic Operation Execution Overview + +When `amo` unit receives an atomic operation: +- it reserves its `rs1`/`rs2`/`rd` registers in processing scheduler +- it issues to `memfy` a read request to a memory register with: + - a specific `AID` (e.g. `0x50`), dedicated to exclusive access + - `ALOCK=0x1` making the request an `exclusive access` + - `ACACHE=0x0` making the request `non-cachable` and `non-bufferable` +- it executes the atomic operation +- it issues to `memfy` a request with the same attributes than read operation + - a write request to update the memory register + - a read request to release the memory register + + +### AMO Unit + +`AMO` will be able: +- to execute only one exclusive access at a time, `device` access +- to support al RISCV atomic operation + +### Processing Unit + +TBD + +### Memfy Unit + +Issue a dedicated single ID, so in-order, `non-cachable` and `non-bufferable` +Can handle exclusive access and normal access for best bandwidth +Should be able to manage completion reodering (possible enhancement) + +### dCache Unit + +Needs to support exclusive access +- OoO stage should manage exclusive access in a dedicated LUT +- Don't replace ID for exclusive access +- Exclusive access = `device` access (no cache) + +Out of exclusive access scope, the cache should be able to manage different IDs and don't +substitute all the time them. Better performance. Reordering should be done only on different IDs. + +## Test Plan + +- An atomic operation can't be stopped if control unit manages async/sync exceptions + diff --git a/doc/axi_id_ordering.md b/doc/axi_id_ordering.md new file mode 100644 index 0000000..d4cc1ea --- /dev/null +++ b/doc/axi_id_ordering.md @@ -0,0 +1,71 @@ +# AMBA AXI ID & Ordering + +## AXI Transaction Identifier + +### Overview + +The AXI protocol includes AXI ID transaction identifiers. A Manager can use these to identify +separate transactions that must be returned in order. All transactions with a given AXI ID value +must remain ordered, but there is no restriction on the ordering of transactions with different ID +values. + +A single physical port can support out-of-order transactions by acting as a number of logical ports, +each handling its transactions in order. + +By using AXI IDs, a Manager can issue transactions without waiting for earlier transactions to +complete. This can improve system performance, because it enables parallel processing of +transactions. + +There is no requirement for Subordinates or Managers to use AXI transaction IDs. Managers and +Subordinates can process one transaction at a time. Transactions are processed in the order they are +issued. + +Subordinates are required to reflect on the appropriate BID or RID response an AXI ID received from +a Manager. + +### Read Data Ordering + +The Subordinate must ensure that the RID value of any returned data matches the ARID value of the +address that it is responding to. + +The interconnect must ensure that the read data from a sequence of transactions with the same ARID +value targeting different Subordinates is received by the Manager in the order that it issued the +addresses. + +The read data reordering depth is the number of addresses pending in the Subordinate that can be +reordered. A Subordinate that processes all transactions in order has a read data reordering depth +of one. The read data reordering depth is a static value that must be specified by the designer of +the Subordinate. + +There is no mechanism that a Manager can use to determine the read data reordering depth of a +Subordinate. + +### Write data ordering + +A Manager must issue write data in the same order that it issues the transaction addresses. + +An interconnect that combines write transactions from different Managers must ensure that it +forwards the write data in address order. + + +### Interconnect use of transaction identifiers + +When a Manager is connected to an interconnect, the interconnect appends additional bits to the +ARID, AWID and WID identifiers that are unique to that Manager port. This has two effects: + +- Managers do not have to know what ID values are used by other Managers because the interconnect + makes the ID values used by each Manager unique by appending the Manager number to the original + identifier. +- The ID identifier at a Subordinate interface is wider than the ID identifier at a Manager + interface. + +For response, the interconnect uses the additional bits of the xID identifier to determine which +Manager port the response is destined for. The interconnect removes these bits of the xID +identifier before passing the xID value to the correct Manager port. + + +#### Master + +#### Slave + +#### Interconnect diff --git a/doc/project_mgt_hw.md b/doc/project_mgt_hw.md index d0f7ff4..393fe98 100644 --- a/doc/project_mgt_hw.md +++ b/doc/project_mgt_hw.md @@ -5,9 +5,6 @@ - [X] Support U-mode - [X] Support PMP/PMA - [X] https://github.com/eembc/coremark - - [ ] Advanced Interrupt controller - - [ ] AXI ERR handling - - [ ] AXI EXOKAY handling - [ ] Atomic operations - stage to execute the instruction, controlling ldst Stages - memfy exposes two interfaces for requests. @@ -26,6 +23,7 @@ Any new features should be carefully study to ensure a proper exception and inte ## Memory +- [ ] Bus fault to route on exceptions https://lists.riscv.org/g/tech-privileged/topic/80351141 - [ ] Better manage ACACHE attribute - [ ] Correct value driven from memfy - [ ] Use it correctly across the cache @@ -55,6 +53,7 @@ Any new features should be carefully study to ensure a proper exception and inte ## Cache Stages +- [ ] Add dedicated RAM for cache, not connected thru AXI interconnect - [ ] AXI4 + Wrap mode for read - [ ] Support datapath adaptation from memory controller - [ ] Narrow transfer support?