Skip to content

Conversation

@zrr1999
Copy link
Member

@zrr1999 zrr1999 commented Nov 7, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

所有通过 blockDim 相关变量计算得到的值(也就是右侧至少匹配到两次),都使用int64变量。

应用下列rule

id: int64-type-cuda-index
language: cpp
files:
  - paddle/phi/kernels/**
ignores:
  - paddle/phi/kernels/legacy/**
  - paddle/phi/kernels/xpu/**
  - paddle/phi/kernels/custom/**
  - paddle/phi/kernels/sparse/**
severity: warning
message: Prefer int64_t for CUDA indexing (threadIdx/blockDim/blockIdx) to avoid overflow on large problems
rule:
  any:
    # int/int32_t/auto x = <expr with threadIdx|blockDim|blockIdx>;
    - all:
        - pattern: $T $VAR = $RIGHT
    # direct-initializer: T x(<RIGHT>);
    - all:
        - pattern: $T $VAR($RIGHT)
constraints:
  T:
    regex: ^(int|int32_t|auto)$
  RIGHT:
    regex: \b(threadIdx|blockDim|blockIdx.*){2,}\b
fix: |
  int64_t $VAR = $RIGHT;

pcard-93269

@paddle-bot
Copy link

paddle-bot bot commented Nov 7, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zrr1999 zrr1999 requested a review from Copilot November 7, 2025 09:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR converts CUDA kernel thread index variables from int to int64_t to prevent potential integer overflow in large-scale tensor computations.

Key changes:

  • Thread index variables (idx, idy, id, tid, etc.) changed from int to int64_t across 60+ GPU kernel files
  • Formatting improvement: blank lines added after variable declarations for better readability
  • Affects kernels in core operations (embedding, convolution, normalization, optimizer, etc.)

Reviewed Changes

Copilot reviewed 68 out of 68 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
paddle/phi/kernels/selected_rows/gpu/lookup_table_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/selected_rows/gpu/lookup_table_grad_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/selected_rows/gpu/adamw_kernel.cu Changed thread index id to int64_t
paddle/phi/kernels/selected_rows/gpu/adam_kernel.cu Changed thread index id to int64_t
paddle/phi/kernels/impl/llm_int8_matmul_kernel_impl.h Changed loop indices linear_index and linear_idx to int64_t
paddle/phi/kernels/gpudnn/mha_cudnn_frontend.cu Changed thread index tid to int64_t
paddle/phi/kernels/gpu/yolo_box_post_kernel.cu Changed 3D thread indices x_id, y_id, z_id to int64_t
paddle/phi/kernels/gpu/yolo_box_head_kernel.cu Changed 3D thread indices x_id, y_id, z_id to int64_t
paddle/phi/kernels/gpu/weighted_sample_neighbors_kernel.cu Changed thread index i to int64_t
paddle/phi/kernels/gpu/viterbi_decode_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/triu_indices_kernel.cu Changed linear index to int64_t
paddle/phi/kernels/gpu/tril_indices_kernel.cu Changed linear index to int64_t
paddle/phi/kernels/gpu/top_p_sampling_kernel.cu Changed thread index idx to int64_t in setup kernels
paddle/phi/kernels/gpu/slogdeterminant_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/shuffle_channel.h Changed thread index to int64_t
paddle/phi/kernels/gpu/row_conv_kernel.cu Changed dimension index d to int64_t, relocated comment
paddle/phi/kernels/gpu/row_conv_grad_kernel.cu Changed dimension index d to int64_t, relocated comment
paddle/phi/kernels/gpu/radam_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/psroi_pool_kernel.cu Changed thread index to int64_t
paddle/phi/kernels/gpu/psroi_pool_grad_kernel.cu Changed thread index to int64_t
paddle/phi/kernels/gpu/partial_sum_kernel.cu Changed thread index id to int64_t
paddle/phi/kernels/gpu/partial_concat_kernel.cu Changed thread index id to int64_t
paddle/phi/kernels/gpu/partial_concat_grad_kernel.cu Changed thread index id to int64_t
paddle/phi/kernels/gpu/nadam_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/multinomial_kernel.cu Changed thread indices id and sample to int64_t
paddle/phi/kernels/gpu/multiclass_nms3_kernel.cu Changed loop variable i to int64_t
paddle/phi/kernels/gpu/lookup_table_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/lookup_table_grad_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/lars_momentum_kernel.cu Changed thread index tid to int64_t
paddle/phi/kernels/gpu/gumbel_softmax_kernel.cu Changed thread index to int64_t
paddle/phi/kernels/gpu/group_norm_kernel.cu Changed channel indices ci to int64_t
paddle/phi/kernels/gpu/fused_token_prune_kernel.cu Changed thread index tid to int64_t
paddle/phi/kernels/gpu/embedding_with_scaled_gradient_grad_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/embedding_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/embedding_grad_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/embedding_grad_add_to_kernel.cu Changed loop index idy to int64_t
paddle/phi/kernels/gpu/edit_distance_kernel.cu Changed thread index idx to int64_t across multiple kernels
paddle/phi/kernels/gpu/determinant_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/depthwise_conv.h Changed thread/loop indices to int64_t across multiple kernels
paddle/phi/kernels/gpu/ctc_align_kernel.cu Changed thread index ind to int64_t
paddle/phi/kernels/gpu/cross_entropy_bwd_w_downcast.cu Changed thread index tid to int64_t
paddle/phi/kernels/gpu/correlation_grad_kernel.cu Changed thread index to int64_t
paddle/phi/kernels/gpu/batch_norm_kernel.cu Changed thread indices and loop variables to int64_t
paddle/phi/kernels/gpu/batch_norm_grad_kernel.cu Changed thread indices and loop variables to int64_t
paddle/phi/kernels/gpu/argsort_grad_kernel.cu Changed thread index to int64_t
paddle/phi/kernels/gpu/apply_per_channel_scale_kernel.cu Changed column offset to int64_t
paddle/phi/kernels/gpu/affine_channel_kernel.cu Changed global thread index to int64_t
paddle/phi/kernels/gpu/affine_channel_grad_kernel.cu Changed global thread index to int64_t
paddle/phi/kernels/gpu/adamax_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/gpu/adagrad_kernel.cu Changed thread index idx to int64_t
paddle/phi/kernels/fusion/gpu/quant_dequant_kernel.h Changed 2D indices n_id and m_id to int64_t
paddle/phi/kernels/fusion/gpu/fused_softmax_mask_utils.h Changed sequence and offset indices to int64_t
paddle/phi/kernels/fusion/gpu/fused_softmax_mask_kernel.cu Changed data indices to int64_t
paddle/phi/kernels/fusion/gpu/block_multi_head_attention_kernel.cu Changed 2D indices n_id and m_id to int64_t
paddle/phi/kernels/fusion/gpu/block_attn.h Changed linear loop index to int64_t
paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_util.cu Changed 2D indices m_i and n_i to int64_t
paddle/phi/kernels/funcs/weight_dequant_functor.h Changed tile index to int64_t
paddle/phi/kernels/funcs/sync_batch_norm_utils.h Changed global and loop indices to int64_t
paddle/phi/kernels/funcs/sparse/scatter.cu.h Changed thread index tid to int64_t
paddle/phi/kernels/funcs/sparse/flatten_indices.cu.h Changed thread index tid to int64_t
paddle/phi/kernels/funcs/scatter.cu.h Changed element index to int64_t
paddle/phi/kernels/funcs/quant_dequant.h Changed 2D indices and element indices to int64_t
paddle/phi/kernels/funcs/norm_utils.cu.h Changed global thread index to int64_t
paddle/phi/kernels/funcs/math_function.cu Changed loop variable to int64_t
paddle/phi/kernels/funcs/fc_functor.cu Changed thread index tid to int64_t
paddle/phi/kernels/funcs/fake_quantize_functor.cu Changed block/thread index bid to int64_t
paddle/phi/kernels/funcs/detail/gru_gpu_kernel.h Changed column index COL to int64_t
paddle/phi/kernels/funcs/broadcast_function.h Changed thread offset to int64_t

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zrr1999 zrr1999 force-pushed the big-tensor/cuda-index branch from 7d800d0 to 6f3e545 Compare November 7, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant