Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes #3607. In the fusion reported in #3607, there's a proposed fusion segment as shown below: ``` **Segmenter** Considering fusion: T14_g_double[iS27{2}, rS28{24000}](Avg), T15_l_double[iS29{2}, rS30{24000}](Var), T16_l_nvfuser_index_t[iS31{2}, rS32{24000}](Count) = Welford ( T2_g_double[iS3{2}, iS4{24000}](Avg), allreduce = false ) d59 = (double)(24000); d61 = double(1) * d59; d65 = (double)(0); d67 = d61 - d65; d69 = (double)(0); b71 = d67 >= d69; d73 = (double)(0); d75 = where(b71, d67, d73); d80 = reciprocal(d75); T17_l_double[iS33{2}] = T15_l_double[iS29{2}, rS30{24000}] * d80; T23_l_double[iS44{2}, bS45{1}] = broadcast( T17_l_double[iS33{2}] ) T25_l_double[iS48{2}, bS49{1}] = T23_l_double[iS44{2}, bS45{1}] + double(1.0000000000000001e-05); T26_g_double[iS50{2}, bS51{1}] = rsqrt(T25_l_double[iS48{2}, bS49{1}]); T27_l_double[iS52{2}, bS53{1}] = Set( T26_g_double[iS50{2}, bS51{1}], cache_op=Streaming ) T28_g_double[iS54{2}, bS55{1 ex 24000}] = expand( T27_l_double[iS52{2}, bS53{1}], {2, 24000} ) T32_g_double[iS62{2}, iS63{24000}] = T28_g_double[iS54{2}, bS55{1 ex 24000}] * T13_g_double[iS25{2}, iS26{24000}]; T37_l_double[iS72{2}, iS73{24000}] = -T32_g_double[iS62{2}, iS63{24000}]; T38_g_double[iS74{2}, rS75{24000}] = reduction( T37_l_double[iS72{2}, iS73{24000}], op = add, initial value = double(0), allreduce = false ) ``` When `canScheduleCompileTime` of the transpose scheduler is called, `FindAllMappedDims` is used with `T32` as the reference. Since its innermost dimension is not connected with `T15` due to the reduction, `propagateSibling` hits the assertion at https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/utils.cpp#L1476. This PR avoids the assertion by checking the existence of reduction first. We may also want to remove the assertion.
- Loading branch information