Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1669: [C++] Deprecate HDFS support #1885

Closed
wants to merge 2 commits into from
Closed

Conversation

wgtmac
Copy link
Member

@wgtmac wgtmac commented Apr 10, 2024

What changes were proposed in this pull request?

Mark readHdfsFile as deprecated.

Why are the changes needed?

Reading ORC on HDFS was introduced in #134 without any test. It has not been updated for 7 years and updating libhdfspp will result in extra dependency like boost library. Staying at an old version of libhdfspp will also prohibit us from updating other libraries like protobuf.

How was this patch tested?

It does not need test.

Was this patch authored or co-authored using generative AI tooling?

No.

@wgtmac
Copy link
Member Author

wgtmac commented Apr 10, 2024

@wgtmac
Copy link
Member Author

wgtmac commented Apr 10, 2024

Also cc @omalley in case there is any blocker to this.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned here, Apache ORC follows Semantic Versioning, @wgtmac .

It's too bad. Apache ORC community cannot remove it at ORC 2.x because ORC 2.0.0 is released already and we follow Semantic Versioning policy. The best thing we can do is to deprecate it at Apache ORC 2.0.1.

Since we cannot remove it in any ORC 2.x releases, please don't claim any removal. Shall we change like the following?

- [[deprecated("readHdfsFile is deprecated in 2.0.1 and will be removed in 2.1.0")]] std::
+ [[deprecated("readHdfsFile is deprecated in 2.0.1")]] std::

@dongjoon-hyun
Copy link
Member

cc @williamhyun , @pavibhai , @deshanxiao , too.

@wgtmac
Copy link
Member Author

wgtmac commented Apr 11, 2024

Does it mean that we cannot upgrade libprotobuf like #1857 in 2.x? Or at least we can choose appropriate protobuf version according to the BUILD_LIBHDFSPP option. @dongjoon-hyun

@stiga-huang
Copy link
Contributor

+1 for choosing the protobuf version based on the BUILD_LIBHDFSPP option. It might have less complaints.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 11, 2024

I'm not sure what you mean by that. May I ask what is broken in libprotobuf PR, #1857, specifically , @wgtmac ?

Does it mean that we cannot upgrade libprotobuf like #1857 in 2.x? Or at least we can choose appropriate protobuf version according to the BUILD_LIBHDFSPP option.

Apache ORC 2 has two-layer protections.

FYI, Semantic Versioning is defined here. You can do whatever you want as long as you follow the policy. In this PR, my concern is the part of a backward compatible manner of this policy. In general, Removal is not an option for backward compatibility in general. That's the reason why I told you that we can do only Deprecatation. Deprecation doesn't mean Removal.

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes

@wgtmac
Copy link
Member Author

wgtmac commented Apr 11, 2024

#1857 only changes from 3.5.1 to 3.21.12. The major version is the same. Do you mean they break anything there?

Yes, libhdfspp does not link with 3.21.12. For example:

[ 84%] Linking CXX executable orc-test
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::GetFsStatusRequestProto::~GetFsStatusRequestProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:9833: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::DisallowSnapshotResponseProto::~DisallowSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23777: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::AllowSnapshotResponseProto::~AllowSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23487: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::RenameSnapshotResponseProto::~RenameSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:23197: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o): in function `hadoop::hdfs::DeleteSnapshotResponseProto::~DeleteSnapshotResponseProto()':
/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:24090: undefined reference to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()'
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(namenode_operations.cc.o):/root/build/libhdfspp_ep-prefix/src/libhdfspp_ep-build/lib/proto/ClientNamenodeProtocol.pb.h:4055: more undefined references to `google::protobuf::internal::ZeroFieldsBase::~ZeroFieldsBase()' follow
/usr/bin/ld: ../libs/thirdparty/libhdfspp_ep-install/lib/libhdfspp_static.a(ClientNamenodeProtocol.pb.cc.o):

@dongjoon-hyun
Copy link
Member

Although we don't want to touch that file, I guess we are able to update the source code like we did in the following PR. WDTY?

@wgtmac
Copy link
Member Author

wgtmac commented Apr 11, 2024

Right, I feel very reluctant to touch that file. I'll look into it later.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Apr 13, 2024
### What changes were proposed in this pull request?
Mark readHdfsFile as deprecated.

### Why are the changes needed?
Reading ORC on HDFS was introduced in #134 without any test. It has not been updated for 7 years and updating libhdfspp will result in extra dependency like boost library. Staying at an old version of libhdfspp will also prohibit us from updating other libraries like protobuf.

### How was this patch tested?
It does not need test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #1885 from wgtmac/ORC-1669.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cea0629)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun added this to the 2.0.1 milestone Apr 13, 2024
@wgtmac
Copy link
Member Author

wgtmac commented Apr 14, 2024

Thanks @dongjoon-hyun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants