Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java impl does not fully adhere to the usage contract of ColumnVector.noNulls and ColumnVector.isNull #1470

Open
guiyanakuang opened this issue Apr 15, 2023 · 4 comments

Comments

@guiyanakuang
Copy link
Member

The comments in ColumnVector indicate that we should only use the isNull array to determine null values when noNulls is set to false. This helps us reuse ColumnVector, for example, during reuse when noNulls transitions from true to true or false to true, we don't need to set isNull, which can help improve performance.

Unfortunately, there are some counterexamples in the Java impl, such as in BitFieldReader.java, where we directly read isNull without checking noNulls first. To ensure correctness, Java resets noNulls and isNull every time, as seen in TreeReaderFactory.java.

This issue originates from the discussion of ORC-1408, whether the ORC Java version needs to modify all counterexamples to adhere to the contract. I'm inclined to make the changes to be consistent with the C++ impl and to avoid unnecessary isNull setting, although I haven't verified how much performance improvement it could bring.

@guiyanakuang
Copy link
Member Author

@wgtmac
Copy link
Member

wgtmac commented Apr 15, 2023

IMHO, it is safe to make the change if the writer and reader can promise the contract to the downstream consumers (e.g. Apache Spark and Apache Hive). The ORC java library can break the promise in the internal implementation.

@deshanxiao
Copy link
Contributor

+1 for the proposal.

It seems that there is no compatibility issue, just a performance optimization and a more standardized use of Hive classes.

@pavibhai
Copy link
Contributor

pavibhai commented Apr 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants