-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
AMOS represents an innovative approach that leverages automated Mapping Generation and performance optimization to enhance the utilization of emerging hardware units like TensorCore. I have encountered some implementation challenges that I seek guidance on.
- in computing compute latency, the intrinsic latency, a fixed value, can be approximated using hardware models. The resulting latency is then multiplied by the trip counts of sequential loops, which operate in a sequential manner not tethered to parallel cores. An inquiry arises: why is this sequencing necessary?
- Operations like tiling, fusion, and other scheduling actions typically precede tensorization, leading to the generation of parallel code. Moreover, scheduling adjustments may introduce variations in the number of software iterations. How should this fluctuation be addressed, and what is the current efficacy of the mapping generation process?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels