Some Qs about implementation

AMOS represents an innovative approach that leverages automated Mapping Generation and performance optimization to enhance the utilization of emerging hardware units like TensorCore. I have encountered some implementation challenges that I seek guidance on.
1. in computing compute latency, the intrinsic latency, a fixed value, can be approximated using hardware models. The resulting latency is then multiplied by the trip counts of sequential loops, which operate in a sequential manner not tethered to parallel cores. An inquiry arises: why is this sequencing necessary? 
2. Operations like tiling, fusion, and other scheduling actions typically precede tensorization, leading to the generation of parallel code. Moreover, scheduling adjustments may introduce variations in the number of software iterations. How should this fluctuation be addressed, and what is the current efficacy of the mapping generation process?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Qs about implementation #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some Qs about implementation #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions