-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[large tensor] Use int64_t for CUDA indexing to avoid overflow #76303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR converts CUDA kernel thread index variables from int to int64_t to prevent potential integer overflow in large-scale tensor computations.
Key changes:
- Thread index variables (idx, idy, id, tid, etc.) changed from
inttoint64_tacross 60+ GPU kernel files - Formatting improvement: blank lines added after variable declarations for better readability
- Affects kernels in core operations (embedding, convolution, normalization, optimizer, etc.)
Reviewed Changes
Copilot reviewed 68 out of 68 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| paddle/phi/kernels/selected_rows/gpu/lookup_table_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/selected_rows/gpu/lookup_table_grad_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/selected_rows/gpu/adamw_kernel.cu | Changed thread index id to int64_t |
| paddle/phi/kernels/selected_rows/gpu/adam_kernel.cu | Changed thread index id to int64_t |
| paddle/phi/kernels/impl/llm_int8_matmul_kernel_impl.h | Changed loop indices linear_index and linear_idx to int64_t |
| paddle/phi/kernels/gpudnn/mha_cudnn_frontend.cu | Changed thread index tid to int64_t |
| paddle/phi/kernels/gpu/yolo_box_post_kernel.cu | Changed 3D thread indices x_id, y_id, z_id to int64_t |
| paddle/phi/kernels/gpu/yolo_box_head_kernel.cu | Changed 3D thread indices x_id, y_id, z_id to int64_t |
| paddle/phi/kernels/gpu/weighted_sample_neighbors_kernel.cu | Changed thread index i to int64_t |
| paddle/phi/kernels/gpu/viterbi_decode_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/triu_indices_kernel.cu | Changed linear index to int64_t |
| paddle/phi/kernels/gpu/tril_indices_kernel.cu | Changed linear index to int64_t |
| paddle/phi/kernels/gpu/top_p_sampling_kernel.cu | Changed thread index idx to int64_t in setup kernels |
| paddle/phi/kernels/gpu/slogdeterminant_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/shuffle_channel.h | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/row_conv_kernel.cu | Changed dimension index d to int64_t, relocated comment |
| paddle/phi/kernels/gpu/row_conv_grad_kernel.cu | Changed dimension index d to int64_t, relocated comment |
| paddle/phi/kernels/gpu/radam_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/psroi_pool_kernel.cu | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/psroi_pool_grad_kernel.cu | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/partial_sum_kernel.cu | Changed thread index id to int64_t |
| paddle/phi/kernels/gpu/partial_concat_kernel.cu | Changed thread index id to int64_t |
| paddle/phi/kernels/gpu/partial_concat_grad_kernel.cu | Changed thread index id to int64_t |
| paddle/phi/kernels/gpu/nadam_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/multinomial_kernel.cu | Changed thread indices id and sample to int64_t |
| paddle/phi/kernels/gpu/multiclass_nms3_kernel.cu | Changed loop variable i to int64_t |
| paddle/phi/kernels/gpu/lookup_table_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/lookup_table_grad_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/lars_momentum_kernel.cu | Changed thread index tid to int64_t |
| paddle/phi/kernels/gpu/gumbel_softmax_kernel.cu | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/group_norm_kernel.cu | Changed channel indices ci to int64_t |
| paddle/phi/kernels/gpu/fused_token_prune_kernel.cu | Changed thread index tid to int64_t |
| paddle/phi/kernels/gpu/embedding_with_scaled_gradient_grad_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/embedding_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/embedding_grad_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/embedding_grad_add_to_kernel.cu | Changed loop index idy to int64_t |
| paddle/phi/kernels/gpu/edit_distance_kernel.cu | Changed thread index idx to int64_t across multiple kernels |
| paddle/phi/kernels/gpu/determinant_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/depthwise_conv.h | Changed thread/loop indices to int64_t across multiple kernels |
| paddle/phi/kernels/gpu/ctc_align_kernel.cu | Changed thread index ind to int64_t |
| paddle/phi/kernels/gpu/cross_entropy_bwd_w_downcast.cu | Changed thread index tid to int64_t |
| paddle/phi/kernels/gpu/correlation_grad_kernel.cu | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/batch_norm_kernel.cu | Changed thread indices and loop variables to int64_t |
| paddle/phi/kernels/gpu/batch_norm_grad_kernel.cu | Changed thread indices and loop variables to int64_t |
| paddle/phi/kernels/gpu/argsort_grad_kernel.cu | Changed thread index to int64_t |
| paddle/phi/kernels/gpu/apply_per_channel_scale_kernel.cu | Changed column offset to int64_t |
| paddle/phi/kernels/gpu/affine_channel_kernel.cu | Changed global thread index to int64_t |
| paddle/phi/kernels/gpu/affine_channel_grad_kernel.cu | Changed global thread index to int64_t |
| paddle/phi/kernels/gpu/adamax_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/gpu/adagrad_kernel.cu | Changed thread index idx to int64_t |
| paddle/phi/kernels/fusion/gpu/quant_dequant_kernel.h | Changed 2D indices n_id and m_id to int64_t |
| paddle/phi/kernels/fusion/gpu/fused_softmax_mask_utils.h | Changed sequence and offset indices to int64_t |
| paddle/phi/kernels/fusion/gpu/fused_softmax_mask_kernel.cu | Changed data indices to int64_t |
| paddle/phi/kernels/fusion/gpu/block_multi_head_attention_kernel.cu | Changed 2D indices n_id and m_id to int64_t |
| paddle/phi/kernels/fusion/gpu/block_attn.h | Changed linear loop index to int64_t |
| paddle/phi/kernels/fusion/cutlass/conv2d/conv2d_util.cu | Changed 2D indices m_i and n_i to int64_t |
| paddle/phi/kernels/funcs/weight_dequant_functor.h | Changed tile index to int64_t |
| paddle/phi/kernels/funcs/sync_batch_norm_utils.h | Changed global and loop indices to int64_t |
| paddle/phi/kernels/funcs/sparse/scatter.cu.h | Changed thread index tid to int64_t |
| paddle/phi/kernels/funcs/sparse/flatten_indices.cu.h | Changed thread index tid to int64_t |
| paddle/phi/kernels/funcs/scatter.cu.h | Changed element index to int64_t |
| paddle/phi/kernels/funcs/quant_dequant.h | Changed 2D indices and element indices to int64_t |
| paddle/phi/kernels/funcs/norm_utils.cu.h | Changed global thread index to int64_t |
| paddle/phi/kernels/funcs/math_function.cu | Changed loop variable to int64_t |
| paddle/phi/kernels/funcs/fc_functor.cu | Changed thread index tid to int64_t |
| paddle/phi/kernels/funcs/fake_quantize_functor.cu | Changed block/thread index bid to int64_t |
| paddle/phi/kernels/funcs/detail/gru_gpu_kernel.h | Changed column index COL to int64_t |
| paddle/phi/kernels/funcs/broadcast_function.h | Changed thread offset to int64_t |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7d800d0 to
6f3e545
Compare
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
所有通过 blockDim 相关变量计算得到的值(也就是右侧至少匹配到两次),都使用int64变量。
应用下列rule
pcard-93269