Skip to content

Difference performance with simple_mode enabled? #5

@LeiWang1999

Description

@LeiWang1999

Hi all, could you kindly introduce the difference between auto-tensorize and auto-tensorize-v4 ? from the observation of amos-gemm benchmarking, the performance of this two strategies is quite resemblance

M K N amos-1000-step-fp16-simple(ms) amos-1000-step-fp16(ms)
2 2 2 Failed to Run Failed to Run
4 4 4 Failed to Run Failed to Run
8 8 8 Failed to Run Failed to Run
16 16 16 0.004545906 0.003936828
32 32 32 0.004610093 0.004310548
64 64 64 0.004638971 0.004614832
128 128 128 0.005128772 0.005059945
256 256 256 0.006975747 0.007367229
512 512 512 0.018055338 0.016287096
1024 1024 1024 0.066839093 0.071785023
2048 2048 2048 0.382059749 0.336489417
4096 4096 4096 2.00519422 2.252330443
8192 8192 8192 21.62599663 18.10944683
16384 16384 16384 111.4660256 132.6751751

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions