[ENG-30326] Upgrade Hudi version in Presto-hudi module to 1.0.2#1
Open
vamsikarnika wants to merge 68 commits intoonehouseinc:masterfrom
Open
[ENG-30326] Upgrade Hudi version in Presto-hudi module to 1.0.2#1vamsikarnika wants to merge 68 commits intoonehouseinc:masterfrom
vamsikarnika wants to merge 68 commits intoonehouseinc:masterfrom
Conversation
…iterOperator (prestodb#25846) Summary: Pull Request resolved: prestodb#25846 Pass the Operator Context's Runtime Stats down into the `TableWriteOperator`'s Page Sink. Specifically this diff makes the following changes: a) `TableWriteOperator` passes its `RuntimeStats` into the Page Sink it creates via `PageSinkManager.createPageSink` b) When the `PageSinkManager.createPageSink` is provided `RuntimeStats`, these `RuntimeStats` are passed into the `Session.toConnectorSession` call, which creates a `FullConnectorSession` instance c) When `Session.toConnectorSession` is provided `RuntimeStats`, it passes this into the `FullConnectorSession` instance it constructs d) Add a `Builder` to `FullConnectorSession`, which allows providing a `RuntimeStats` instance to `FullConnectorSession` at construction-time. `FullConnectorSession.getRuntimeStats()` now returns the `RuntimeStats` which was set at construction-time. If no `RuntimeStats` were provided at construction-time, then `FullConnectorSession.getRuntimeStats()` defaults to return the `Session` object's `RuntimeStats`—this preserves backwards compatibility. All changes preserve forward-compatibility. ## Context Without this change, the `FullConnectorSession`'s `RuntimeStats` points to the `Session`'s `RuntimeStat`s. All metrics added to the `Session`'s `RuntimeStats` within an Operator Worker-side are discarded. That is, all Runtime Metrics added to the Connector Session's RuntimeStats when executing `TableWriterOperator` were being completely discarded. Specifically, in Meta, the stats from our internal filesystem implementation were missing. Passing the Operator Context's `RuntimeStats` instance down into Connector Session is the simplest way to fix this. Additionally, since the previous `RuntimeStat`s for `TableWriteOperator`'s `FullConnectorSession` were always discarded, we can be confident that replacing them with the `OperatorContext` `RuntimeStat`s will not break anyone else's code. Differential Revision: D80675849
There is an existing HiveClientConfig property hive.orc.use-column-names to access ORC file by column names, but no session property. This commit moves the existing HiveClientConfig property to HiveCommonClientConfig and introduces a session property in HiveCommonSessionProperties. It also implements changes accordingly in DwrfAggregatedPageSourceFactory, OrcAggregatedPageSourceFactory, OrcSelectivePageSourceFactory and OrcBatchPageSourceFactory. Constructors in those classes do not take boolean useOrcColumnNames anymore. Tests where those are used have also been changed. Hive connector documentation has been changed. An integration test has been added to TestHiveDistributedQueries.java. Helper function created in HiveTestUtils to replace function in TestHiveIntegrationSmokeTest. Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java. Closes-Issue: prestodb#24134 Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java. Add additional test with different column names to TestHiveDistributedQueries.java
The test framework client now receives statement executing results with `clearTransactionId` and `startTransactionId` flags embedded.
Velox provides a function to install the Arrow library. We don’t need to copy and paste the same code here and can re-use it. There is an EXTRA_ARROW_OPTIONS variable that allows custom Arrow library build options to be able to pass along that Arrow Flight should be built.
Reuse the existing Velox VarcharType to implement the type Char(n) in protocol. Add a SystemConfig "char-n-type-enabled" to guard this feature. Note this will make Char(n) type carry the behavior of VarcharType type. It is a different behavior from Char(n) type in Presto today, where it has a fixed number of characters. We suppose the user could call rpad() if today's behavior is needed.
…#25902) ## Description This PR update the github action to publish maven artifacts with central publishing method, since maven repo doesn't allow executable jar(with shell script) to be published, so we will create a github release and publish the jars Need fix in release branch: prestodb#25900 Sample release for executable jars https://github.com/unix280/presto/releases ## Motivation and Context ## Impact Release 0.294 ## Test Plan Tested the github release in myrepo: https://github.com/unix280/presto/actions/runs/17272968441 Tested the maven publishing in local env ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
@pdabre12 has been voted as module committer for the Presto sidecar module. Also, I fixed a bug that project committers could not approve some C++ code. Per our contributing guide, project committers must be capable of approving all code (although C++ module committers are preferred for approving and merging C++ code).
…restodb#25687) Summary: Similar to cpp worker added the endpoint for java. We won't be using the worker-load as going forward we will be focussing on cpp worker only Differential Revision: D79471792
WriteMapping support for decimal type is already present for writing values but is missing from the query builder. This PR adds the write function to the query builder buildSql function
…bc write mappings These types are missing in the new write mapping interface. If implemented, this will add them back.
Added the `iceberg.engine.hive.lock-enabled` to enable or disable table locks when iceberg accesses a hive table. This can be overridden with the table property `engine.hive.lock-enabled`
The map function will not sort a json object by its keys, despite the json_parse function sorting the same input. If implemented, this will sort json objects. Resolves prestodb#24207
Summary: - Add abstract class BuiltInSpecialFunctionNamespaceManager - Add BuiltInNativeFunctionNamespaceManager - Refactor BuiltInPluginFunctionNamespaceManager to extend the abstract class - Deduplicate sidecar function registry logic by moving some of it to presto-main-base module from presto-native-sidecar-plugin module - Add function name conflict logic to FunctionAndTypeManager that overrides SQL built in functions but does not override Java built in functions. - Add retry logic in to fetch function registry from worker: retry interval is every 1 minute Note: `show functions` will show both built in functions in the same namespace. This is already similar behavior to regular Native sidecar namespace enabled with default presto.default prefix. The `show functions` logic is not addressed in this change. Can add some unit tests for show functions as well Tests: Added unit tests that enable to flag for this feature, and it is overriding the SQL function implementation properly. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
Summary:
Fix to support GCC14 build
- Replace `{}` with explicit empty container to avoid the following error within optionals.
error: converting to 'std::in_place_t' from list would use explicit contructor
`{}` leads to copy initialization which is not allowed since in_place_t is marked explicit
- Add Import `chrono` in `Duration.h` as gcc14 mandates having it
- Correct include directory path for proxygen
- Ignore errors associated with template-id-cdtor as gcc14 fails build for constructors having template support
Rollback Plan:
```
== NO RELEASE NOTE ==
```
Differential Revision: D80784416
Pulled By: pratikpugalia
Presto-main was split into presto-main and presto-main-base. Update paths in codeowners file to reflect the change.
…stodb#25750) Summary: I added threshold for logging memory pool allocations": facebookincubator/velox#14437 In this adding I'm adding corresponding session property to configure the threshold. Differential Revision: D80066283
Co-authored-by: Christian Zentgraf <czentgr@us.ibm.com>
## Description This PR is [the fix from branch release-0.294]( prestodb#25900), to fix maven release issues ## Motivation and Context Merge the fix from release branch into master branch ## Impact Newer releases ## Test Plan Tested with release 0.294 ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
…5357 Fix prestodb#25357 Added type mapping table for Delta Lake to PrestoDB Co-Authored-By: Steve Burnett <burnett@pobox.com> Co-Authored-By: Jalpreet Singh Nanda <jalpreetnanda@gmail.com>
Summary: Adds output row stats for sapphire-velox related sink operators Properly close write file on broadcast write Reviewed By: singcha Differential Revision: D81271224
Co-authored-by: Ashish Tadose <ashishtadose@gmail.com> Co-authored-by: Ariel Weisberg <aweisberg@fb.com> Co-authored-by: Jalpreet Singh Nanda (:imjalpreet) <jalpreetnanda@gmail.com>
Currently, the presto-session-property-managers module has gotten large due to the different implementations for file based and db based session property managers. If implemented, this commit will split the module into 3, ``presto-session-property-managers-common`` will contain the API implementation for a session property manager along with some tests. ``presto-file-session-property-manager`` will contain the file based implementation, and ``presto-db-session-property-manager`` will contain the db based implementation.
Summary: In this diff I update the taskWriterCount and taskPartitionedWriterCount value to system config which was deleted in D80124169 diff. kTaskWriterCount is important config for the Impulse connector created table to increase the parallism to ingestion the data by creating multiple writer drivers. Differential Revision: D81522582
…#25943) ## Description This PR implements native catalog properties support for Presto Spark, enabling proper configuration and management of catalog properties for native execution processes. **Key Changes:** * **Added `NativeExecutionCatalogProperties` class**: A new class that holds catalog properties for native execution processes, where each catalog generates a separate `<catalog-name>.properties` file. * **Enhanced `WorkerProperty` and `PrestoSparkWorkerProperty`**: Extended the worker property classes to support catalog properties configuration and proper property file generation. * **Updated Native Execution Module**: Modified `NativeExecutionModule` and `NativeExecutionProcess` to integrate catalog properties into the native execution workflow. * **Improved Configuration Integration**: Updated `PrestoSparkModule`, `PrestoSparkServiceFactory` to properly wire the new catalog properties functionality. * **Enhanced Test Coverage**: Added and updated tests in `TestNativeExecutionSystemConfig`, `TestNativeExecutionProcess`, and other test classes to ensure proper catalog properties handling. ## Motivation and Context This change is required to support proper catalog configuration in Presto Spark's native execution mode. Previously, catalog properties were not properly managed for native execution processes, which limited the ability to configure connectors effectively in native mode. ## Test Plan * **Unit Tests**: Added test `NativeExecutionCatalogProperties` * **Integration Tests**: Updated `TestNativeExecutionProcess` and `TestPrestoSparkHttpClient` to verify catalog properties integration ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
…5878) ## Description Fixes a small typo on aws dependency install command. ## Motivation and Context <!---Why is this change required? What problem does it solve?--> <!---If it fixes an open issue, please link to the issue here.--> ## Impact <!---Describe any public API or user-facing feature change or any performance impact--> ## Test Plan <!---Please fill in how you tested your change--> ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```
…esto-spark code in presto-native-execution module
…tercept functions (prestodb#25475) (prestodb#25748) Summary: ## Context Currently we don't enforce intermediate/return type are the same in Coordinator and Prestissimo Worker. Velox creates vectors for intermediate/return results based on a plan that comes from Coordinator. Then Prestissimo tries to use those vector and not crash. In practise we had a crash some time ago due to such a mismatch (D74199165). And I added validation to Velox to catch such kind of mismatches early: facebookincubator/velox#13322 But we wasn't able to enable it in prod, because the validation failed for "regr_slope" and "regr_intercept" functions. ## What's changed? In this diff I'm fixing "regr_slope" and "regr_intercept" intermediate types. Basically in Java `AggregationState` for all these functions is the same: ``` AggregationFunction("regr_slope") AggregationFunction("regr_intercept") AggregationFunction("regr_sxy") AggregationFunction("regr_sxx") AggregationFunction("regr_syy") AggregationFunction("regr_r2") AggregationFunction("regr_count") AggregationFunction("regr_avgy") AggregationFunction("regr_avgx") ``` But in Prestissimo the state storage is more optimal: ``` AggregationFunction("regr_slope") AggregationFunction("regr_intercept") ``` These 2 aggregation functions don't have M2Y field. And this is more efficient, because we don't waste memory and CPU on the field, that aren't needed. So I moved M2Y to extended class, the same as it works in Velox: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/CovarianceAggregates.cpp?fbclid=IwY2xjawLRTetleHRuA2FlbQIxMQBicmlkETFiT0N3UFR0M2VKOHl6MHRhAR6KRQ1VUQdCkZXzwj14sMQrVZ-R9QBH1utuGJb5U_lyGzDwt8PwV317QRVNJg_aem_-ePxZ-fHO5MNgfUmayVJFA#L326-L337 No major changes, mostly just reorganized the code. ## Test plan I tested `REGR_SLOPE`, `REGR_INTERCEPT` and `REGR_R2` functions since they are heavily used in prod and cover both cases: with and without M2Y field. What my test looked like. For all 3 `REGR_*` functions I found some prod queries, then: 1. Ran them on prev Java build 2. Ran them on new (with this PR) Java build 3. Ran them on prev Prestissimo build 4. Ran them on new (with this PR) Prestissimo build And compared the output results. They all were identical. With this manual test we covered `Coordinator -> Java Worker` and `Coordinator -> Prestissimo Worker` integrations. ## Next steps In this diff I'm trying to apply the same optimization to Java. With this fix, the signatures will become the same in Java and Prestissimo and we will be able to enable the validation Differential Revision: D77625566 == NO RELEASE NOTES ==
yihua
reviewed
Sep 5, 2025
presto-hudi/src/main/java/com/facebook/presto/hudi/split/HudiPartitionSplitGenerator.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiRecordCursors.java
Show resolved
Hide resolved
yihua
reviewed
Sep 5, 2025
| HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(conf).setBasePath(table.getPath()).build(); | ||
| HoodieTimeline timeline = metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants(); | ||
| String timestamp = timeline.lastInstant().map(HoodieInstant::getTimestamp).orElse(null); | ||
| String timestamp = timeline.lastInstant().map(HoodieInstant::requestedTime).orElse(null); |
Collaborator
There was a problem hiding this comment.
Sth to audit to make sure that for table version 8 and above the latest instant based on the completion time should be returned.
Co-Authored-By: Yuya Ebihara <ebyhry@gmail.com> Co-Authored-By: Pádraig O'Sullivan <osullivan.padraig@gmail.com>
…ecar enabled clusters Adds a new plugin : presto-native-sql-invoked-functions-plugin that contains all inlined SQL functions except those with overridden native implementations. This plugin is intended to be loaded only in sidecar enabled clusters.
Changes adapted from trino/PR#11336, 12910 Original commits: 9e8d51ad45f57267f5f7fa6bf8e8c4ec56103dda f0508a7ab420449c6e2960ecf1d0a8d7058242da Author: kasiafi Modifications were made to adapt to Presto including: Removal of Node Location from TrinoException Added new SemanticErrorCodes Changed Void context to SqlPlannerContext in RelationPlanner.java Add newUnqualified to Field class with Presto specification Add getCanonicalValue to Identifier.java Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com> Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>
## Description Update 0.294 release notes about executable jars, related to PR prestodb/prestodb.github.io#303 ## Motivation and Context prestodb/prestodb.github.io#303 ## Impact 0.294 release notes ## Test Plan N/A ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Documentation: - Add details on building and running executable JARs in the 0.294 release notes --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: Steve Burnett <burnett@pobox.com>
77e5893 to
3a51460
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Upgrade the presto-hudi-bundle version in the presto-hudi module to 1.0.2
Impact
Presto hudi connector can read hudi tables created using 1.0.2 version
Release Notes