Skip to content

Conversation

@Aggarwal-Raghav
Copy link
Contributor

@Aggarwal-Raghav Aggarwal-Raghav commented Nov 13, 2025

What changes were proposed in this pull request?

TEZ-4661 Include commons-collections3.x in hive-exec jar

Why are the changes needed?

In Tez-1.0.0-SNAPSHOT, hadoop has been upgraded to 3.4.2 and hadoop dependencies in tez project has stopped shipping commons-collections-3.x. But hive still depends on commons-collections3.x directly as well as through third-party dependency like opencsv, commons-beanutils etc. It's better to migrate the import statement to commons-collection4.x at the very least

Does this PR introduce any user-facing change?

No

How was this patch tested?

On local setup

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Nov 13, 2025

Error Stacktrace:
Screenshot 2025-11-13 at 9 40 36 PM

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Nov 13, 2025

In order to migrate from commons-collections3.x to 4.x in hive, few prerequisites are there:

  1. Migrate hadoop to 3.4.2
  2. Migrate opencsv to 5.12.0
  3. commons-beanutils 1.x to 2.x migration. NOTE: 2.x is not offically released, only milestone version is present https://issues.apache.org/jira/browse/BEANUTILS-532?focusedCommentId=17908246&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17908246
  4. Need to figure out what to do about atlas-intg dependecy as it also brings commons-collections-3.x
  5. Please check HIVE-28486: Upgrade commons-collections to commons-collections4 due t… #5588 review comments for more details

@Aggarwal-Raghav Aggarwal-Raghav marked this pull request as ready for review November 18, 2025 11:45
@Aggarwal-Raghav Aggarwal-Raghav changed the title Shaded commons-collections3.x in hive-exec jar for Tez-1.0.0 to prevent ClassNotFound Exception HIVE-29326: Shade commons-collections3.x in hive-exec jar for Tez-1.0.0 to prevent ClassNotFound Exception Nov 18, 2025
@Aggarwal-Raghav
Copy link
Contributor Author

CC @abstractdog

@deniskuzZ
Copy link
Member

is that required for Tez-1.0 release?

ql/pom.xml Outdated
<include>org.apache.thrift:libthrift</include>
<include>org.apache.thrift:libfb303</include>
<include>org.datanucleus:javax.jdo</include>
<include>commons-collections:commons-collections</include>
Copy link
Member

@deniskuzZ deniskuzZ Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add TODO, i.e. drop once migrated to commons-collection-4.x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But just want to highlight that commons-lang 2.x and 3.x are shipped even commons-lang2.x is banned import in parent pom.xml. But cleanup of this shading is separate issue.

My thinking was to updated the imports to commons-collections4.x like #5588 and shade commons-collections3.x in ql pom.xml + add TODO statement :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to update the imports in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do and update PR shortly.

@Aggarwal-Raghav
Copy link
Contributor Author

is that required for Tez-1.0 release?

yes, once we upgrade tez to 1.x in hive, we'll face this issue. As Hive has both commons-collection3.x and 4.x, a quick fix can be to update all the imports to commons-collections4.x as done in #5588

Based on stacktrace, it failing here

if (CollectionUtils.isEmpty(childOperators)) {

But we need commons-collection3.x in tez container classother. My reasoning is, if any 3rd party jar like opencsv needs commons-collection3.x , then it will fail with same error. Hence shading in hive-exec seems unavoidable.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@Aggarwal-Raghav
Copy link
Contributor Author

Forced push again to resolve merge conflicts.
As PR comprises of changes in #5588, have Co-authored-by: P Eshwitha Sai saieshwitha999@gmail.com

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Nov 26, 2025

Updated commons-collections4 from 4.1 to 4.4 to be in sync with hadoop
Verified in packaging and standalone packaging that both commons-collections3.x and 4.x are shipped.
Screenshot 2025-11-26 at 10 32 58 PM

Screenshot 2025-11-26 at 10 33 02 PM

@Aggarwal-Raghav
Copy link
Contributor Author

is this a new one https://sonarcloud.io/project/issues?impactSeverities=MEDIUM&sinceLeakPeriod=true&issueStatuses=OPEN%2CCONFIRMED&pullRequest=6181&id=apache_hive&open=AZrCd_yiXWASHw-aup3S?

without my patch also, it was unused. I used regex replace from intellIJ, so missed it :-(

@Aggarwal-Raghav
Copy link
Contributor Author

Removed unused imports from the touched files

@deniskuzZ
Copy link
Member

deniskuzZ commented Nov 28, 2025

LGTM, just to confirm:
if I'll do mvn dependency:tree commons-collections won't be there, only commons-collections4. However, for Tez we'll include commons-collections in tar.gz. Right?

@Aggarwal-Raghav
Copy link
Contributor Author

LGTM, just to confirm:
if I'll do mvn dependency:tree commons-collections won't be there, only commons-collections4. However, for Tez we'll include commons-collections in tar.gz. Right?

No. Will provide details in few minutes

@Aggarwal-Raghav
Copy link
Contributor Author

@deniskuzZ , please find the dependency tree and explanation: dependency_tree.txt

Dependency Updates
Previous State: Hive shipped with commons-collections 3.2.2 and 4.1.
New State after this PR: Hive will ship with commons-collections 3.2.2 and 4.4.

We cannot fully migrate to commons-collections 4.x at this time due to transitive dependencies. Libraries such as hadoop-3.4.1, commons-beanutils, accumulo, atlas, and opencsv still require commons-collections 3.x. To support these third-party dependencies, Hive will continue to ship both version 3.x and 4.x.

Based on the stacktrace attached in description, ClassNotFound was thrown by Hive

if (CollectionUtils.isEmpty(childOperators)) {

That's why import stament change from 3.x to 4.x is done.

Its possible that with only import changes we're good and don't need to shade commons-collection-3.x but to be on safe side shading is done in-case if any codeflow, while running insite tez container, make use of these 3rd party dependency (accumulo, beanutils, opencsv) which used commons-collectcions3.x then ClassNotFound will be thrown.

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Nov 28, 2025

If there are any concerns on shading then we can wait for tez 1.0.0 release and hadoop-3.4.2 upgrade and then we can revisit shading but import changes make sense to be at this stage for sure.

@deniskuzZ
Copy link
Member

If there are any concerns on shading then we can wait for tez 1.0.0 release and hadoop-3.4.2 upgrade and then we can revisit shading but import changes make sense to be at this stage for sure.

I’m in favor of waiting

@Aggarwal-Raghav
Copy link
Contributor Author

If i update the PR removing shading changes, will it be ok?

@Aggarwal-Raghav Aggarwal-Raghav changed the title HIVE-29326: Shade commons-collections3.x in hive-exec jar for Tez-1.0.0 to prevent ClassNotFound Exception HIVE-29326: Move imports to commons-collections4.x to prevent ClassNotFound in Tez-1.0.0 Nov 28, 2025
@deniskuzZ
Copy link
Member

If i update the PR removing shading changes, will it be ok?

Where is the shading part in this PR? i don't see any hive-exec pom changes

@Aggarwal-Raghav
Copy link
Contributor Author

If i update the PR removing shading changes, will it be ok?

Where is the shading part in this PR? i don't see any hive-exec pom changes

its because i removed them in https://github.com/apache/hive/compare/ab7894593ae83656d72645c3d3f497805d70a6c2..15b1f9f348a3be417c0a0b92f6ea47fcd291d095

@deniskuzZ
Copy link
Member

Without shading LGTM, we can merge and come back to this before next release, hopefully Tez-1.0 would be out by that time

…tFound in Tez-1.0.0

As part of this PR, following changes has been done:
- Migrated commons-collections3 import to commons-collections4
- Upgraded commons-collections4 version to 4.4 to be in sync with hadoop

Co-authored-by: P Eshwitha Sai <saieshwitha999@gmail.com>
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants