Skip to content

Conversation

@Aggarwal-Raghav
Copy link
Contributor

@Aggarwal-Raghav Aggarwal-Raghav commented Nov 6, 2025

What changes were proposed in this pull request?

Plesae check HIVE-29307

Why are the changes needed?

(int) waves is converting 1.7 (default) => 1 causing less number of container to launch.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

In Prod scenario

@Aggarwal-Raghav
Copy link
Contributor Author

CC @abstractdog , can you please provide your thoughts on this?

splits =
inputFormat.getSplits(
jobConf,
numSplits.orElse((int) Math.min(availableSlots * waves, Integer.MAX_VALUE)));
Copy link
Member

@deniskuzZ deniskuzZ Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not simply (int) (availableSlots * waves) as done in SplitGrouper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can be done and I'm also in favour of that. It just that, multiplyExact protects from overflow hence kept it. will update it after CI outcome.

@abstractdog
Copy link
Contributor

This fix makes sense to me overall — split waves is indeed a float, so truncating it to an int is clearly a bug.

It’s worth noting that waves is buried deep inside the split generation logic, making it an expert-level setting. So it’s not surprising that this issue hasn’t been fixed until now (based on customer escalation history).

mm_all.q.out change also makes sense, it's the opposite of that was done in HIVE-19703

I wish we have a much cleaner unit test to reflect how these settings affect the number of splits, but it's not necessarily the scope of this PR

@Aggarwal-Raghav
Copy link
Contributor Author

This fix makes sense to me overall — split waves is indeed a float, so truncating it to an int is clearly a bug.

It’s worth noting that waves is buried deep inside the split generation logic, making it an expert-level setting. So it’s not surprising that this issue hasn’t been fixed until now (based on customer escalation history).

mm_all.q.out change also makes sense, it's the opposite of that was done in HIVE-19703

I wish we have a much cleaner unit test to reflect how these settings affect the number of splits, but it's not necessarily the scope of this PR

Thanks for the reply @abstractdog . There is 1 UT failure mm_dp.q. I'm debugging that, seems relevant. I'll see if a junit test is possible for the test the number of containers launched logic.

@Aggarwal-Raghav Aggarwal-Raghav changed the title Wrong Split computation causing less number of container launch HIVE-29307: Incorrect split calculation causing less container to launch Nov 7, 2025
@Aggarwal-Raghav Aggarwal-Raghav marked this pull request as ready for review November 7, 2025 18:49
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 7, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants