From e3cce76694bb47a20097786647e5ed01d03888fc Mon Sep 17 00:00:00 2001
From: Damien Pretet <damien.pretet@me.com>
Date: Wed, 20 Dec 2023 22:07:56 +0100
Subject: [PATCH] Start atomic operation support

---
 doc/atomic_ops.md      | 98 ++++++++++++++++++++++++++++++++++++++++++
 doc/axi_id_ordering.md | 71 ++++++++++++++++++++++++++++++
 doc/project_mgt_hw.md  |  5 +--
 3 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 doc/atomic_ops.md
 create mode 100644 doc/axi_id_ordering.md

diff --git a/doc/atomic_ops.md b/doc/atomic_ops.md
new file mode 100644
index 0000000..0467287
--- /dev/null
+++ b/doc/atomic_ops.md
@@ -0,0 +1,98 @@
+# Atomic Operations Support
+
+## Overview
+
+The aim of this dev (made from v1.6.1) is to support atomic operation instructions. Atomic
+operations will bring synchronization techniques required by kernels. The goal for FRISCV is to be
+able to boot a kernel like FreeRTOS or Linux (without MMU) and makes the core a platform for real
+worl usecases.
+
+From [OS dev wiki](https://wiki.osdev.org/Atomic_operation):
+
+    An atomic operation is an operation that will always be executed without any other process being
+    able to read or change state that is read or changed during the operation. It is effectively
+    executed as a single step, and is an important quality in a number of algorithms that deal with
+    multiple independent processes, both in synchronization and algorithms that update shared data
+    without requiring synchronization.
+
+For single core system:
+
+    If an operation requires multiple CPU instructions, then it may be interrupted in
+    the middle of executing. If this results in a context switch (or if the interrupt handler refers
+    to data that was being used) then atomicity could be compromised. It is possible to use any
+    standard locking technique (e.g. a spinlock) to prevent this, but may be inefficient. If it is
+    possible, disabling interrupts may be the most efficient method of ensuring atomicity (although
+    note that this may increase the worst-case interrupt latency, which could be problematic if it
+    becomes too long).
+
+For multi core system:
+
+    n multiprocessor systems, ensuring atomicity exists is a little harder. It is still possible to
+    use a lock (e.g. a spinlock) the same as on single processor systems, but merely using a single
+    instruction or disabling interrupts will not guarantee atomic access. You must also ensure that
+    no other processor or core in the system attempts to access the data you are working with.
+
+In summary, an atomic operation can be useful to:
+- synchronize threads among a core
+- synchronize cores in a SOC
+- ensure a memory location can be read-then-update in any situation, including exceptions handling
+
+Atomic operations will be implemented in a dedicated processing unit and in load/store stage
+(`memfy`). Atomic Operation unit (`AMO`) will issue read/write request to load/store stage (`MEMFY`)
+with a specific and unique ID. dCache stage will also be updated to better support `ACACHE`, slighlty
+change `AID` handling and put in place exclusive access support.
+
+## Design Plan
+
+- Document and list all AXI usage and limitations in the IP.
+
+### AXI Ordering
+
+The core, `memfy` and `dCache` stages, will be updated on `AID` usage. Please refer
+to [AMBA spec](./axi_id_ordering.md) for further details of `AID` usage and ordering model.
+
+
+### Atomic Operation Execution Overview
+
+When `amo` unit receives an atomic operation:
+- it reserves its `rs1`/`rs2`/`rd` registers in processing scheduler
+- it issues to `memfy` a read request to a memory register with:
+    - a specific `AID` (e.g. `0x50`), dedicated to exclusive access
+    - `ALOCK=0x1` making the request an `exclusive access`
+    - `ACACHE=0x0` making the request `non-cachable` and `non-bufferable`
+- it executes the atomic operation
+- it issues to `memfy` a request with the same attributes than read operation
+    - a write request to update the memory register
+    - a read request to release the memory register
+
+
+### AMO Unit
+
+`AMO` will be able:
+- to execute only one exclusive access at a time, `device` access
+- to support al RISCV atomic operation
+
+### Processing Unit
+
+TBD
+
+### Memfy Unit
+
+Issue a dedicated single ID, so in-order, `non-cachable` and `non-bufferable`
+Can handle exclusive access and normal access for best bandwidth
+Should be able to manage completion reodering (possible enhancement)
+
+### dCache Unit
+
+Needs to support exclusive access
+- OoO stage should manage exclusive access in a dedicated LUT
+- Don't replace ID for exclusive access
+- Exclusive access = `device` access (no cache)
+
+Out of exclusive access scope, the cache should be able to manage different IDs and don't
+substitute all the time them. Better performance. Reordering should be done only on different IDs.
+
+## Test Plan
+
+- An atomic operation can't be stopped if control unit manages async/sync exceptions
+
diff --git a/doc/axi_id_ordering.md b/doc/axi_id_ordering.md
new file mode 100644
index 0000000..d4cc1ea
--- /dev/null
+++ b/doc/axi_id_ordering.md
@@ -0,0 +1,71 @@
+# AMBA AXI ID & Ordering
+
+## AXI Transaction Identifier
+
+### Overview
+
+The AXI protocol includes AXI ID transaction identifiers. A Manager can use these to identify
+separate transactions that must be returned in order. All transactions with a given AXI ID value
+must remain ordered, but there is no restriction on the ordering of transactions with different ID
+values. 
+
+A single physical port can support out-of-order transactions by acting as a number of logical ports,
+each handling its transactions in order. 
+
+By using AXI IDs, a Manager can issue transactions without waiting for earlier transactions to
+complete. This can improve system performance, because it enables parallel processing of
+transactions. 
+
+There is no requirement for Subordinates or Managers to use AXI transaction IDs. Managers and
+Subordinates can process one transaction at a time. Transactions are processed in the order they are
+issued. 
+
+Subordinates are required to reflect on the appropriate BID or RID response an AXI ID received from
+a Manager.
+
+### Read Data Ordering
+
+The Subordinate must ensure that the RID value of any returned data matches the ARID value of the
+address that it is responding to.
+
+The interconnect must ensure that the read data from a sequence of transactions with the same ARID
+value targeting different Subordinates is received by the Manager in the order that it issued the
+addresses.
+
+The read data reordering depth is the number of addresses pending in the Subordinate that can be
+reordered. A Subordinate that processes all transactions in order has a read data reordering depth
+of one. The read data reordering depth is a static value that must be specified by the designer of
+the Subordinate.
+
+There is no mechanism that a Manager can use to determine the read data reordering depth of a
+Subordinate.
+
+### Write data ordering
+
+A Manager must issue write data in the same order that it issues the transaction addresses.
+
+An interconnect that combines write transactions from different Managers must ensure that it
+forwards the write data in address order.
+
+
+### Interconnect use of transaction identifiers
+
+When a Manager is connected to an interconnect, the interconnect appends additional bits to the
+ARID, AWID and WID identifiers that are unique to that Manager port. This has two effects:
+
+- Managers do not have to know what ID values are used by other Managers because the interconnect
+  makes the ID values used by each Manager unique by appending the Manager number to the original
+  identifier.
+- The ID identifier at a Subordinate interface is wider than the ID identifier at a Manager
+  interface.
+
+For response, the interconnect uses the additional bits of the xID identifier to determine which
+Manager port the response is destined for. The interconnect removes these bits of the xID
+identifier before passing the xID value to the correct Manager port.
+
+
+#### Master
+
+#### Slave
+
+#### Interconnect
diff --git a/doc/project_mgt_hw.md b/doc/project_mgt_hw.md
index d0f7ff4..393fe98 100644
--- a/doc/project_mgt_hw.md
+++ b/doc/project_mgt_hw.md
@@ -5,9 +5,6 @@
     - [X] Support U-mode
     - [X] Support PMP/PMA
     - [X] https://github.com/eembc/coremark
-    - [ ] Advanced Interrupt controller
-        - [ ] AXI ERR handling
-        - [ ] AXI EXOKAY handling
     - [ ] Atomic operations
         - stage to execute the instruction, controlling ldst Stages
         - memfy exposes two interfaces for requests.
@@ -26,6 +23,7 @@ Any new features should be carefully study to ensure a proper exception and inte
 
 ## Memory
 
+- [ ] Bus fault to route on exceptions https://lists.riscv.org/g/tech-privileged/topic/80351141
 - [ ] Better manage ACACHE attribute
     - [ ] Correct value driven from memfy
     - [ ] Use it correctly across the cache
@@ -55,6 +53,7 @@ Any new features should be carefully study to ensure a proper exception and inte
 
 ## Cache Stages
 
+- [ ] Add dedicated RAM for cache, not connected thru AXI interconnect
 - [ ] AXI4 + Wrap mode for read
 - [ ] Support datapath adaptation from memory controller
     - [ ] Narrow transfer support?