2018-06-15 (GCS 1.9.0, BQ 0.13.0)
Changelog
Cloud Storage connector:
-
Update all dependencies to latest versions.
-
Delete metadata cache functionality because Cloud Storage has strong native list operation consistency already. Deleted properties:
fs.gs.metadata.cache.enable fs.gs.metadata.cache.type fs.gs.metadata.cache.directory fs.gs.metadata.cache.max.age.info.ms fs.gs.metadata.cache.max.age.entry.ms
-
Decrease default value for max requests per batch from 1,000 to 30.
-
Make max requests per batch value configurable with property:
fs.gs.max.requests.per.batch (default: 30)
-
Support Hadoop 3.
-
Change Maven project structure to be better compatible with IDEs.
-
Delete deprecated
GoogleHadoopGlobalRootedFileSystem
. -
Fix thread leaks that were occurring when YARN log aggregation uploaded logs to GCS.
-
Add interface through which user can directly provide the access token.
-
Add more retries and error handling in GoogleCloudStorageReadChannel, to make it more resilient to network errors; also add a property to allow users to specify number of retries on low level GCS HTTP requests in case of server errors and I/O errors.
-
Add properties to allow users to specify connect timeout and read timeout on low level GCS HTTP requests.
-
Include prefix/directory objects metadata into
storage.objects.list
requests response to improve performance (i.e. setincludeTrailingDelimiter
parameter forstorage.objects.list
GCS requests totrue
).
BigQuery connector:
- POM updates for GCS connector 1.9.0.
- Update all dependencies to latest versions.
- Change Maven project structure to be better compatible with IDEs.
- Support Hadoop 3.
- Default BigQueryInputFormats to use unsharded exports and deprecate sharded exports.
- Deprecate BigQueryOutputFormat in favor of IndirectBigQueryOutputFormat.
- Add interface through which user can directly provide the access token.
- Support Cloud KMS key name in the output table spec.