Skip to content

Releases: apache/orc

v1.7.7

19 Nov 19:12
Compare
Choose a tag to compare

Milestone

Changelog

Bug

  • ORC-1283 ENABLE_INDEXES does not take effect

Test

Task

  • ORC-1256 Publish tests jar to maven central
  • ORC-1268 Set CMP0135 policy for CMake 3.24+

v1.8.0

03 Sep 23:31
f2bb463
Compare
Choose a tag to compare

Milestone

Changelog

New Feature and Notable Changes

  • ORC-450 Support selecting list indices without materializing list items
  • ORC-824 Add column statistics for List and Map
  • ORC-1004 Java ORC writer supports the selection vector
  • ORC-1075 Support reading ORC files with no column statistics
  • ORC-1125 Support decoding decimals in RLE
  • ORC-1136 Optimize reads by combining multiple reads without significant separation into a single read
  • ORC-1138 Seek vs Read Optimization
  • ORC-1172 Add row count limit config for one stripe
  • ORC-1212 Upgrade protobuf-java to 3.17.3
  • ORC-1220 Set min.hadoop.version to 2.7.3
  • ORC-1248 Redefine Hadoop dependency for Apache ORC 1.8.0
  • ORC-1256 Publish test-jar to maven central
  • ORC-1260 Publish shaded-protobuf classifier artifacts

Improvement

  • ORC-825 Use Empty Array For Collections toArray
  • ORC-826 Do Not Use Collection Contains/Get
  • ORC-828 Improve Fetch Data Set Process
  • ORC-829 Optimize Serialization percentileBits
  • ORC-831 Do Not Copy String When Flushing Dictionary
  • ORC-833 RunLengthIntegerReaderV2 Calculate Batch Size Once
  • ORC-834 Do Not Convert to String in DecimalFromTimestampTreeReader
  • ORC-835 Cache TRUE/FALSE Bytes in StringGroupFromBooleanTreeReader
  • ORC-836 StringGroupFromDoubleTreeReader Use Double toString
  • ORC-837 Reuse HiveDecimalWritable in ConvertTreeReaderFactory
  • ORC-838 Simplify compareTo/equals/putBuffer of ByteBufferAllocatorPool
  • ORC-840 Remove Superfluous Array Fill in RecordReaderImpl
  • ORC-841 Remove Superfluous Array Fill in StringHashTableDictionary
  • ORC-842 Remove newKey from StringHashTableDictionary
  • ORC-844 Improve hashCode Methods
  • ORC-847 Do Not Create Empty Array in StringGroupFromBinaryTreeReader
  • ORC-852 Allow DynamicByteArray to Return a ByteBuffer
  • ORC-853 Optimize writeDouble Implementation
  • ORC-855 Remove Unused isRepeating from RunLengthIntegerReaderV2
  • ORC-865 Bump opencsv from 3.9 to 5.5.1
  • ORC-883 Dependency Audit and QA
  • ORC-897 optimization loop termination condition in readerIsCompatible method
  • ORC-935 Bump commons-csv from 1.8 to 1.9.0
  • ORC-937 Replace deprecated method
  • ORC-958 Convert command support overwrite option
  • ORC-969 Evaluate SearchArguments using file and stripe level stats
  • ORC-975 Avoid double counting closestFixedBits in percentileBits method
  • ORC-982 Extract checkstyle to a single file, help newcomers check code style
  • ORC-988 Bump opencsv from 5.5.1 to 5.5.2
  • ORC-992 Reached max repeat length, we can directly decide to use DELTA encoding
  • ORC-1005 Make that the java and C++ implementations of determineEncoding in RunLengthIntegerWriterV2 are consistent.
  • ORC-1007 Fix a warning from the shade plugin
  • ORC-1013 Renaming a parameter in constructors of TreeWriter's derived classes
  • ORC-1014 Add details when we get IOExceptions from file system
  • ORC-1020 Improve orc::RleDecoderV2::nextDirect
  • ORC-1027 Filter processing to allow filter injections that cannot be represented via SArgs
  • ORC-1047 Handle quoted field names during string schema parsing
  • ORC-1077 Remove commons-codec dependency and use java.util.Base64
  • ORC-1099 Extend ReadIntent to support MAP and UNION type
  • ORC-1101 Improve malformed STRUCT handling
  • ORC-1122 Add buffer to decode the whole run in RleDecoderV2
  • ORC-1137 Improve float/double conversion in DoubleColumnReader::next()
  • ORC-1149 Bump slf4j.version to 1.7.36
  • ORC-1150 Improve RowReaderImpl::computeBatchSize()
  • ORC-1152 Support encoding short decimals in RLEv2
  • ORC-1156 Update opencsv to 5.6
  • ORC-1163 Bump zookeeper from 3.7.0 to 3.8.0
  • ORC-1169 Use Hadoop 3.3.2 on Java 17+
  • ORC-1178 Use hadoop 3.3.3 on Java 17+

Bug

  • ORC-845 Fix NPE in DynamicIntArray toString
  • ORC-929 Fix NaN at orc-tools 'meta' command
  • ORC-1129 The build of tool-test should depends on cpp tools
  • ORC-1159 Crash when the last stripe is skipped
  • ORC-1242 Bump threeten-extra to 1.7.1

Test

  • ORC-860 Add dependabot
  • ORC-864 Bump jackson.version from 2.12.2 to 2.12.4
  • ORC-877 Bump junit-vintage-engine from 5.7.0 to 5.7.2
  • ORC-888 Bump objenesis from 3.1 to 3.2
  • ORC-905 Add an integration test for example
  • ORC-917 Bump mockito-core from 3.7.0 to 3.11.2
  • ORC-919 Spark bench objenesis should be the same as Spark.
  • ORC-920 Use junit.version and mockito.version property and bump junit to 5.7.2
  • ORC-925 Simplify assertions
  • ORC-928 Bump checkstyle from 8.44 to 8.45.1
  • ORC-932 Bump byte-buddy from 1.10.19 to 1.11.12
  • ORC-934 Add integration tests for Java bench
  • ORC-940 Use Hadoop 3.3.1 in bench module
  • ORC-955 Add Javadoc generation GitHub Action job
  • ORC-963 Build benchmark module always for integration testing
  • ORC-966 Bump byte-buddy from 1.11.12 to 1.11.13
  • ORC-967 Bump mockito.version from 3.11.2 to 3.12.1
  • ORC-986 Bump mockito.version from 3.12.1 to 3.12.4
  • ORC-987 Bump jackson.version from 2.12.4 to 2.12.5
  • ORC-1001 Bump maven-enforcer-plugin to 3.0.0
  • ORC-1019 Remove redundant jackson dependencies
  • ORC-1022 Bump byte-buddy from 1.11.13 to 1.11.19
  • ORC-1038 Bump mockito.version from 3.12.4 to 4.0.0
  • ORC-1074 Bump byte-buddy from 1.11.19 to 1.12.6
  • ORC-1079 Add Linux clang GitHub Action job
  • ORC-1085 Bump auto-service from 1.0 to 1.0.1
  • ORC-1089 Add test cases verifying writers with selected vector
  • ORC-1104 Use Spark 3.2.1 in benchmark
  • ORC-1107 Fix NPE at benchmark data schema loading
  • ORC-1110 Bump mockito.version from 4.0.0 to 4.3.1
  • ORC-1126 Bump byte-buddy from 1.12.6 to 1.12.8
  • ORC-1139 Benchmark for Seek vs Read
  • ORC-1141 Bump mockito.version from 4.3.1 to 4.4.0
  • ORC-1145 Add Java 18 to GitHub Action CI.
  • ORC-1153 Bump byte-buddy from 1.12.8 to 1.12.9
  • ORC-1157 Update guava to 31.1-jre
  • ORC-1168 Update byte-buddy to 1.12.10
  • ORC-1177 Upgrade mockito.version to 4.5.1
  • ORC-1179 Upgrade checkstyle to 10.2 on Java 11+
  • ORC-1187 Use main instead of master in merge_orc_pr.py
  • ORC-1194 Bump mockito.version to 4.6.0
  • ORC-1195 Bump checkstyle to 10.3
  • ORC-1196 Add spark benchmark integration tests to GHA
  • ORC-1197 Bump mockito.version from 4.6.0 to 4.6.1
  • ORC-1201 Remove Debian 9 from Docker Tests
  • ORC-1203 Bump maven-enforcer-plugin to 3.1.0
  • ORC-1206 Bump netty-all to 4.1.78.Final
  • ORC-1207 Upgrade Spark to 3.3.0
  • ORC-1208 Bump byte-buddy to 1.12.12
  • ORC-1209 Bump checkstyle to 10.3.1
  • ORC-1234 Upgrade objenesis to 3.2 in Spark benchmark
  • ORC-1236 Bump checkstyle to 10.3.2
  • ORC-1243 Bump byte-buddy to 1.12.13
  • ORC-1253 Add Fedora 37 docker test
  • ORC-1254 Add spotbugs check

Task

  • ORC-868 Pin gson to 2.2.4
  • ORC-869 Pin jmh 1.20
  • ORC-872 Bump kryo-shaded from 3.0.3 to 4.0.2
  • ORC-874 Bump zookeeper from 3.6.2 to 3.7.0
  • ORC-884 Bump jettison from 1.1 to 1.4.1
  • ORC-887 Remove ORC Twitter link from news page
  • ORC-890 Pin minimum support Hadoop version to 2.2.0
  • ORC-892 Pin scala-library to 2.12.10
  • ORC-898 Bump threeten-extra from 1.5.0 to 1.7.0
  • ORC-899 Archive Apache ORC 1.4.x in releases page
  • ORC-900 Update doap_orc.rdf for Apache Projects page
  • ORC-908 Use https instead of http for website links in pom.xml
  • ORC-914 Pin maven-dependency-plugin to 3.1.2
  • ORC-916 Bump annotations from 17.0.0 to 21.0.1
  • ORC-918 Pin protobuf-java to 2.5.0
  • ORC-923 Bump apache from 23 to 24
  • ORC-946 Unified json library
  • ORC-949 Add CustomImportOrder rule
  • ORC-956 Bump annotations from 21.0.1 to 22.0.0
  • ORC-977 Update webpages and TestVectorOrcFile.java to be more neutral
  • ORC-1045 Bump commons-cli to 1.5
  • ORC-1056 Bump annotations from 22.0.0 to 23.0.0
  • ORC-1103 Use Maven 3.8.4
  • ORC-1140 Documentation for Seek vs Read
  • ORC-1158 Add notification settings to .asf.yam
  • ORC-1162 Fix Apache Project Website Checks Warningl
  • ORC-1165 Enable GitHub Action in branch-1.8
  • ORC-1166 Enable snapshot publishing in branch-1.8
  • ORC-1171 Skip build and test on docker and site updates
  • ORC-1173 Pin jodd-core to 3.5.2
  • ORC-1176 Upgrade maven-jar-plugin to 3.2.2
  • ORC-1185 Add merge_orc_pr.py
  • ORC-1210 Upgrade maven to 3.8.6
  • ORC-1216 Pin org.jetbrains.annotations dependency to 17.0.0
  • ORC-1211 Upgrade maven-assembly-plugin to 3.4.0
  • ORC-1214 Bump maven-assembly-plugin to 3.4.1
  • ORC-1217 Downgrade org.jetbrains.annotations to 17.0.0
  • ORC-1223 Move DirectDecompressWrapper to org.apache.orc.impl
  • ORC-1224 Move getDecompressor to HadoopShimsCurrent
  • ORC-1226 Add a deprecation warning for Hadoop 2.7.2 and below
  • ORC-1229 Move KeyProviderImpl to org.apache.orc.impl
  • ORC-1230 Move encryption utility functions to HadoopShimsCurrent
  • ORC-1246 Revamp ORC Website
  • ORC-1247 Improve Apache ORC website and docs
  • ORC-1249 Move site/_docs/releases.md to site/releases/index.md
  • ORC-1255 Fix ORC website navbar highlight
  • ORC-1257 Publish multi-architecture ORC-dev docker images
  • ORC-1261 Rename shaded pattern com.google.protobuf25 to org.apache.orc.protobuf
  • ORC-1263 Add decimal type to ORC Website
  • ORC-1221 Move NullKeyProvider to org.apache.orc.impl

v1.7.6

18 Aug 05:59
7ff749a
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1204: ORC MapReduce writer to flush when long arrays
  • ORC-1205: nextVector should invoke ensureSize when reusing vectors
  • ORC-1215: Remove a wrong NotNull annotation on value of setAttribute
  • ORC-1222: Upgrade tools.hadoop.version to 2.10.2
  • ORC-1227: Use Constructor.newInstance instead of Class.newInstance
  • ORC-1228: Fix setAttribute to handle null value

Tests

  • ORC-932: Bump byte-buddy from 1.10.19 to 1.11.12 (#842)
  • ORC-1169: Use Hadoop to 3.3.2 on Java 17+ (#1113)
  • ORC-1178: Use Hadoop 3.3.3 on Java 17+ (#1129)
  • ORC-1193: Bump parquet.version to 1.12.3
  • ORC-1207: Upgrade Spark to 3.3.0
  • ORC-1210: Upgrade maven to 3.8.6
  • ORC-1234: Upgrade objenesis to 3.2 in Spark benchmark
  • ORC-1235: Bump avro.version to 1.11.1
  • ORC-1240: Update site README to use apache/orc-dev DockerHub image
  • ORC-1241: Use apache/orc-dev DockerHub repository in Docker tests
  • ORC-1244: Upgrade byte-buddy to 1.12.13 in branch-1.7
  • ORC-1245: Use Hadoop 3.3.4 on Java 17+ and benchmark

Documentation

  • MINOR: Update DOAP with new releases (#1127)
  • ORC-900: Update doap_orc.rdf for Apache Projects page (#806)
  • ORC-1231: Update supported OS list in building.md
  • ORC-1237: Remove a wrong image link to article-footer.png
  • ORC-1238: Update DOAP with 1.7.5

Task

  • ORC-1185: Add merge_orc_pr.py
  • ORC-1187: Use main instead of master in merge_orc_pr.py
  • ORC-1213: Use https in ThirdpartyToolchain.cmake
  • ORC-1226: Add a deprecation warning for Hadoop 2.7.2 and below

v1.7.5

16 Jun 19:28
56a02d1
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1151: [C++] Fix ColumnWriter for non-UTC Timestamp columns (#1088)
  • ORC-1160: [C++] Fix seekToRow can't seek within selected row group (#1102)
  • ORC-1133: [C++] Fix csv-import tool options
  • ORC-1183: Upgrade gson to 2.9.0
  • ORC-1186: Limit family in aarch64 profile
  • ORC-1188: Fix ORC_PREFER_STATIC_ZLIB

Improvements

  • ORC-1198: Add a new PhysicalFsWriter constructor with FSDataOutputStream parameter
  • ORC-1199: Use Google mirror of Maven Central as the primary

Tests

  • ORC-1155: Add Ubuntu 22.04 to docker tests (#1093)
  • ORC-1154: Bump hive.version from 3.1.2 to 3.1.3 (#1090)
  • ORC-1161: Add MacOS 12 and remove MacOS 10
  • ORC-1174: Add Ubuntu 22.04 to GitHub Action (#1128)
  • ORC-1182: Use slf4j-simple instead of deprecated slf4j-log4j12
  • ORC-1184: Use Hadoop 3.3.3 in benchmark module
  • ORC-1189: Update README.md and help command message in benchmark module and .gitignore
  • ORC-1190: Fix ORCWriterBenchMark dumpDir initialization
  • ORC-1191: Updated TLC Taxi Benchmark Dataset
  • ORC-1192: Use orc.zstd instead of orc.none (#1144)
  • ORC-1196: Add Spark benchmark integration tests to GHA
  • ORC-1201: Remove Debian 9 from Docker Tests

Documentation

  • MINOR: Add ASF verification instruction link (#1134)

v1.6.14

14 Apr 21:54
9d22674
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1121: Fix column coversion check bug which causes column filters don't work (#1055)
  • ORC-1146: Float category missing check if the statistic sum is a finite value (#1078)
  • ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values (#1082)

Tests

  • ORC-1016: Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1113: Remove CentOS 8 from docker-based tests (#1040)

v1.7.4

16 Apr 03:35
5a1b27b
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1120: Remove C++ library limitation about write version (#1054)
  • ORC-1121: Fix column conversion check bug which causes column filters don't work (#1055)
  • ORC-1127: [C++] add missing version of UNSTABLE-PRE-2.0 (#1064)
  • ORC-1146: Float category missing check if the statistic sum is a finite value (#1078)
  • ORC-1147: Use isNaN instead of isFinite to determine the contain NaN values (#1082)

Improvements

Tests

Documentation

v1.7.3

10 Feb 07:23
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1060: Reduce memory usage when vectorized reading dictionary string encoding columns (#971)
  • ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail #979
  • ORC-1067: [C++] Upgrade ZSTD to 1.5.1 (#981)
  • ORC-1078: Row group end offset doesn't accommodate all the blocks (#996)
  • ORC-1081: Fix heap-use-after-free in SearchArgumentBuilderImpl::end() (#998)
  • ORC-1087: [C++] Handle unloaded seek positions when seeking in an uncompressed chunk (#1008)
  • ORC-1092: [C++] Upgrade LZ4 to version 1.9.3 (#1012)
  • ORC-1102: [C++] Upgrade ZSTD to 1.5.2 (#1026)

Improvements (orc-tools)

  • ORC-1055: [C++] Add the timezone option for the csv-import tool (#975)
  • ORC-1082: Improve FileDump and JsonFileDump to be robust on missing column statistics (#1003)
  • ORC-1098: [C++] Support specifying type ids or column names in cpp tools (#1020)

Documentation

Task

Tests

v1.6.13

10 Feb 07:04
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-1065: Fix IndexOutOfBoundsException in ReaderImpl.extractFileTail (#979)
  • ORC-1078: Row group end offset doesn’t accommodate all the blocks (#996)

Tests

v1.7.2

10 Feb 07:05
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-492: Avoid potential ArrayIndexOutOfBoundsException when getting WriterVersionn (#961)
  • ORC-1041: Use memcpy during LZO decompression (#958)
  • ORC-1053: Fix time zone offset precision when convert tool converts LocalDateTime to Timestamp is not consistent with the internal default precision of ORC (#967)
  • ORC-1059: Align findColumns behaviour between 1.6 and 1.7 release (#972)

Improvements (orc-tools)

  • ORC-1012: Support specifying columns in orc-scan (#921)
  • ORC-1017: Add sizes tool to determine and display the sizes of each column in a set of files. (#925)
  • ORC-1023: Support writing bloom filters in ConvertTool (#933)

Tests

  • ORC-915: Remove io.netty.netty from Spark benchmark (#822)
  • ORC-938: Bump netty-all from 4.1.42.Final to 4.1.66.Final (#819)
  • ORC-948: Add hive benchmark integration tests (#860)
  • ORC-957: Bump netty-all from 4.1.66.Final to 4.1.67.Final (#870)
  • ORC-1021: Add -fno-omit-frame-pointer in DEBUG and RELWITHDEBINFO builds (#932)
  • ORC-1051: Update benchmark dependencies (#964)

v1.7.1

10 Feb 07:05
Compare
Choose a tag to compare

Milestone

Changelog

Bug Fixes

  • ORC-879 - Flaky Test for TestJsonReader
  • ORC-1008 - Overflow detection code is incorrect in IntegerColumnStatisticsImpl
  • ORC-1009 - [C++] Missing string include causes build failure with MSVC++
  • ORC-1015 - Update OrcFile.WriterOptions::memory javadoc
  • ORC-1016 - Use openssl@1.1 in GitHub Action MacOS CIs
  • ORC-1024 - BloomFilter hash computation is inconsistent between Java and C++ clients
  • ORC-1029 - Could not load 'org.apache.orc.DataMask.Provider' when using orc encryption and spark executor with multi cores!
  • ORC-1030 - Java Tools Recover File command does not accurately find OrcFile.MAGIC
  • ORC-1034 - The search byte array algorithm is incorrectly implemented in FileDump.java
  • ORC-1035 - backupDataPath may be incorrect in recoverFile
  • ORC-1039 - Make FileDump.recoverFile handle side files only if they exist

Test

  • ORC-1000 - Use Java 17 in GitHub Action
  • ORC-1002 - Add java17 profile for Java17 unit testing
  • ORC-1010 - Bump tzdata from tzdata-2020e-1.tar.xz to tzdata-2021b-1.tar.xz
  • ORC-1011 - Activate java17 profile automatically
  • ORC-1032 - Bump parquet.version from 1.12.0 to 1.12.2
  • ORC-1036 - Due to tzdata upgrade, the fixed download links in CI are often not working
  • ORC-1037 - Bump spark.version from 3.1.2 to 3.2.0
  • ORC-1040 - Add Debian 11 docker test
  • ORC-1042 - Ignore unused-function C++ compile warning on CentOS 7
  • ORC-1043 - Fix C++ conversion compilation error in CentOS 7