-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29326: Move imports to commons-collections4.x to prevent ClassNotFound in Tez-1.0.0 #6181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
In order to migrate from commons-collections3.x to 4.x in hive, few prerequisites are there:
|
|
CC @abstractdog |
|
is that required for Tez-1.0 release? |
ql/pom.xml
Outdated
| <include>org.apache.thrift:libthrift</include> | ||
| <include>org.apache.thrift:libfb303</include> | ||
| <include>org.datanucleus:javax.jdo</include> | ||
| <include>commons-collections:commons-collections</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add TODO, i.e. drop once migrated to commons-collection-4.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But just want to highlight that commons-lang 2.x and 3.x are shipped even commons-lang2.x is banned import in parent pom.xml. But cleanup of this shading is separate issue.
My thinking was to updated the imports to commons-collections4.x like #5588 and shade commons-collections3.x in ql pom.xml + add TODO statement :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan to update the imports in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do and update PR shortly.
yes, once we upgrade tez to 1.x in hive, we'll face this issue. As Hive has both commons-collection3.x and 4.x, a quick fix can be to update all the imports to commons-collections4.x as done in #5588 Based on stacktrace, it failing here
But we need commons-collection3.x in tez container classother. My reasoning is, if any 3rd party jar like opencsv needs commons-collection3.x , then it will fail with same error. Hence shading in hive-exec seems unavoidable. |
deniskuzZ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
83f4064 to
dea6be1
Compare
dea6be1 to
7a1ffc7
Compare
|
Forced push again to resolve merge conflicts. |
without my patch also, it was unused. I used regex replace from intellIJ, so missed it :-( |
|
Removed unused imports from the touched files |
|
LGTM, just to confirm: |
No. Will provide details in few minutes |
|
@deniskuzZ , please find the dependency tree and explanation: dependency_tree.txt Dependency Updates We cannot fully migrate to commons-collections 4.x at this time due to transitive dependencies. Libraries such as Based on the stacktrace attached in description, ClassNotFound was thrown by Hive
That's why import stament change from 3.x to 4.x is done. Its possible that with only import changes we're good and don't need to shade commons-collection-3.x but to be on safe side shading is done in-case if any codeflow, while running insite tez container, make use of these 3rd party dependency (accumulo, beanutils, opencsv) which used commons-collectcions3.x then ClassNotFound will be thrown. |
|
If there are any concerns on shading then we can wait for tez 1.0.0 release and hadoop-3.4.2 upgrade and then we can revisit shading but import changes make sense to be at this stage for sure. |
I’m in favor of waiting |
|
If i update the PR removing shading changes, will it be ok? |
ab78945 to
15b1f9f
Compare
15b1f9f to
c64ff93
Compare
Where is the shading part in this PR? i don't see any hive-exec pom changes |
its because i removed them in https://github.com/apache/hive/compare/ab7894593ae83656d72645c3d3f497805d70a6c2..15b1f9f348a3be417c0a0b92f6ea47fcd291d095 |
|
Without shading LGTM, we can merge and come back to this before next release, hopefully Tez-1.0 would be out by that time |
…tFound in Tez-1.0.0 As part of this PR, following changes has been done: - Migrated commons-collections3 import to commons-collections4 - Upgraded commons-collections4 version to 4.4 to be in sync with hadoop Co-authored-by: P Eshwitha Sai <saieshwitha999@gmail.com>
c64ff93 to
0b68cdd
Compare
|






What changes were proposed in this pull request?
TEZ-4661 Include commons-collections3.x in hive-exec jar
Why are the changes needed?
In Tez-1.0.0-SNAPSHOT, hadoop has been upgraded to 3.4.2 and hadoop dependencies in tez project has stopped shipping commons-collections-3.x. But hive still depends on commons-collections3.x directly as well as through third-party dependency like opencsv, commons-beanutils etc. It's better to migrate the import statement to commons-collection4.x at the very least
Does this PR introduce any user-facing change?
No
How was this patch tested?
On local setup