Skip to content

Conversation

@colluca
Copy link
Collaborator

@colluca colluca commented Jul 14, 2025

This PR adds support for reduction operations to the Snitch cluster.

On the narrow interface we add a separate CSR register which allows to set the user field of the narrow AXI interface.
The reduction/multicast (or collective communication) operation is encoded in the AXI user field of the narrow interface and is handled outside of the cluster.

On the wide interface a custom instruction was added to the iDMA, that allows to set the user field in its wide AXI requests.

All collective communication operations are handled outside of the cluster, even if the TCDM is selected as the destination. They are diverted to the external network interface from which they are then rerouted to the TCDM inside the cluster.

TODO: Clarify what assumptions are made on the user field, e.g. collective op in LSBs, etc.

In detail:

  • Bump AXI (Improved route decoding for collectiv operation colluca/axi#1)
  • Add user field to the reqrsp interface (changes contained in hw/reqrsp_interface directory)
  • Add user field to the tcdm interface (changes contained in hw/tcdm_interface directory)
  • Bump iDMA to v0.6.5 replacing dmmcast instruction with dmuser
  • Replace CSR_MCAST with CSR_USER_LOW and add CSR_USER_HIGH
  • Update riscv-opcodes to reflect said changes
  • Update Snitch decoder and LSU accordingly
  • Add a demux at the CC level to reroute collective communication requests outside the cluster (even if they would be directed to the TCDM)
  • Fix oseda version
  • Add user_narrow_t and user_dma_t parameters to cluster, and derive user widths from these
  • Add snrt_fence function
  • Fix wakeup routine
  • Fix CLS pointer initialization
  • Use narrow multicast in snrt_inter_cluster_barrier()
  • Partially address Make bootrom depend on cluster configuration #264 by abstracting the bootrom start address
  • Add cluster_base_offset parameter to know the full (aligned) address space of the cluster
  • Extend tests.mk to support tests from different directories. Update system integration docs accordingly. Also extend it to provide a PHONY target for each test.
  • Fix sn_include_deps when overriding variables on the command-line. Bump pymakeutils to include respective fix in list-dependent-make-targets.
  • Extend experiment utils for use in Picobello (i.e. implementing callbacks for commands which differ in system) and to build software in parallel.

TODO

  • Merge Improved route decoding for collectiv operation colluca/axi#1 and bump.
  • Do not instantiate different XBAR IPs, just use different configurations.
  • Do we need the CollectiveWidth parameter, in addition to the user types?
  • Double check grammar in all comments
  • Explain gaps in reduction opcode encoding
  • Add corresponding external function declaration for every inline definition
  • Check new DMA function names are consistent with the others

@colluca colluca changed the title Narrow reduction Support narrow reduction operations Jul 14, 2025
@colluca colluca changed the title Support narrow reduction operations treewide: Support narrow reduction operations Jul 14, 2025
@colluca colluca force-pushed the narrow_reduction branch from daaa34d to d651676 Compare July 15, 2025 16:03
@colluca colluca force-pushed the narrow_reduction branch 4 times, most recently from af6df0a to 068dd7a Compare September 21, 2025 15:39
@colluca colluca force-pushed the narrow_reduction branch 2 times, most recently from 5453b0c to 47f46e1 Compare September 23, 2025 17:05
@colluca colluca force-pushed the narrow_reduction branch 4 times, most recently from bdfd98d to b8ec8fd Compare September 26, 2025 08:06
@colluca colluca force-pushed the narrow_reduction branch 4 times, most recently from b679cdd to fca047f Compare October 5, 2025 17:40
colluca and others added 28 commits October 21, 2025 12:02
Requires allocating the barrier pointer in the communicator in L1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants