Skip to content

Comments

[ENG-30326] Upgrade Hudi version in Presto-hudi module to 1.0.2#1

Open
vamsikarnika wants to merge 68 commits intoonehouseinc:masterfrom
vamsikarnika:upgrade_hudi_version
Open

[ENG-30326] Upgrade Hudi version in Presto-hudi module to 1.0.2#1
vamsikarnika wants to merge 68 commits intoonehouseinc:masterfrom
vamsikarnika:upgrade_hudi_version

Conversation

@vamsikarnika
Copy link
Collaborator

@vamsikarnika vamsikarnika commented Aug 28, 2025

Description

Upgrade the presto-hudi-bundle version in the presto-hudi module to 1.0.2

Impact

Presto hudi connector can read hudi tables created using 1.0.2 version

Release Notes

== NO RELEASE NOTE ==

tanjialiang and others added 18 commits August 26, 2025 12:19
…iterOperator (prestodb#25846)

Summary:
Pull Request resolved: prestodb#25846

Pass the Operator Context's Runtime Stats down into the `TableWriteOperator`'s Page Sink.

Specifically this diff makes the following changes:

a) `TableWriteOperator` passes its `RuntimeStats` into the Page Sink it creates via `PageSinkManager.createPageSink`
b) When the `PageSinkManager.createPageSink` is provided `RuntimeStats`, these `RuntimeStats` are passed into the `Session.toConnectorSession` call, which creates a `FullConnectorSession` instance
c) When `Session.toConnectorSession` is provided `RuntimeStats`, it passes this into the `FullConnectorSession` instance it constructs
d) Add a `Builder` to `FullConnectorSession`, which allows providing a `RuntimeStats` instance to `FullConnectorSession` at construction-time. `FullConnectorSession.getRuntimeStats()` now returns the `RuntimeStats` which was set at construction-time. If no `RuntimeStats` were provided at construction-time, then `FullConnectorSession.getRuntimeStats()` defaults to return the `Session` object's `RuntimeStats`—this preserves backwards compatibility.

All changes preserve forward-compatibility.

## Context

Without this change, the `FullConnectorSession`'s `RuntimeStats` points to the `Session`'s `RuntimeStat`s. All metrics added to the `Session`'s `RuntimeStats` within an Operator Worker-side are discarded. That is, all Runtime Metrics added to the Connector Session's RuntimeStats when executing `TableWriterOperator` were being completely discarded.

Specifically, in Meta, the stats from our internal filesystem implementation were missing.

Passing the Operator Context's `RuntimeStats` instance down into Connector Session is the simplest way to fix this.

Additionally, since the previous `RuntimeStat`s for `TableWriteOperator`'s `FullConnectorSession` were always discarded, we can be confident that replacing them with the `OperatorContext` `RuntimeStat`s will not break anyone else's code.

Differential Revision: D80675849
There is an existing HiveClientConfig property hive.orc.use-column-names to access ORC file by column names, but no session property.
This commit moves the existing HiveClientConfig property to HiveCommonClientConfig and introduces a session property in HiveCommonSessionProperties.
It also implements changes accordingly in DwrfAggregatedPageSourceFactory, OrcAggregatedPageSourceFactory, OrcSelectivePageSourceFactory and OrcBatchPageSourceFactory.
Constructors in those classes do not take boolean useOrcColumnNames anymore. Tests where those are used have also been changed.
Hive connector documentation has been changed.
An integration test has been added to TestHiveDistributedQueries.java.
Helper function created in HiveTestUtils to replace function in TestHiveIntegrationSmokeTest.
Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java.

Closes-Issue: prestodb#24134

Remove superfluous constructors that have hiveClientConfig in parameter list from DwrfAggregatedPageSourceFactory.java and OrcAggregatedPageSourceFactory.java and change explicit calls in HiveTestUtils.java.

Add additional test with different column names to TestHiveDistributedQueries.java
The test framework client now receives statement executing results with
`clearTransactionId` and `startTransactionId` flags embedded.
Velox provides a function to install the Arrow library. We don’t need
to copy and paste the same code here and can re-use it.
There is an EXTRA_ARROW_OPTIONS variable that allows custom
Arrow library build options to be able to pass along that Arrow Flight
should be built.
Reuse the existing Velox VarcharType to implement the type Char(n) in protocol.
Add a SystemConfig "char-n-type-enabled" to guard this feature.
Note this will make Char(n) type carry the behavior of VarcharType type. It is a different
behavior from Char(n) type in Presto today, where it has a fixed number
of characters. We suppose the user could call rpad() if today's behavior is needed.
…#25902)

## Description

This PR update the github action to publish maven artifacts with central
publishing method, since maven repo doesn't allow executable jar(with
shell script) to be published, so we will create a github release and
publish the jars

Need fix in release branch:
prestodb#25900

Sample release for executable jars
https://github.com/unix280/presto/releases

## Motivation and Context


## Impact
Release 0.294

## Test Plan
Tested the github release in myrepo:
https://github.com/unix280/presto/actions/runs/17272968441
Tested the maven publishing in local env

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes

```
== NO RELEASE NOTE ==
```
@pdabre12 has been voted as module committer for the Presto sidecar module.

Also, I fixed a bug that project committers could not approve some C++ code.  Per our contributing guide, project committers must be capable of approving all code (although C++ module committers are preferred for approving and merging C++ code).
…restodb#25687)

Summary:
Similar to cpp worker added the endpoint for java.
We won't be using the worker-load as going forward we will be focussing
on cpp worker only

Differential Revision: D79471792
WriteMapping support for decimal type is already present for writing values but is missing from the query builder.
This PR adds the write function to the query builder buildSql function
…bc write mappings

These types are missing in the new write mapping interface. If implemented, this will add them back.
Added the `iceberg.engine.hive.lock-enabled` to enable or disable table locks
when iceberg accesses a hive table. This can be overridden with the table property
`engine.hive.lock-enabled`
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
@vamsikarnika vamsikarnika changed the title Upgrade hudi version [ENG-30326] Upgrade Hudi version in Presto-hudi module to 1.0.2 Aug 28, 2025
kevintang2022 and others added 11 commits August 28, 2025 10:11
Summary:
- Add abstract class BuiltInSpecialFunctionNamespaceManager
  - Add BuiltInNativeFunctionNamespaceManager
- Refactor BuiltInPluginFunctionNamespaceManager to extend the abstract
class
- Deduplicate sidecar function registry logic by moving some of it to
presto-main-base module from presto-native-sidecar-plugin module
- Add function name conflict logic to FunctionAndTypeManager that
overrides SQL built in functions but does not override Java built in
functions.
- Add retry logic in to fetch function registry from worker: retry
interval is every 1 minute

Note: `show functions` will show both built in functions in the same
namespace. This is already similar behavior to regular Native sidecar
namespace enabled with default presto.default prefix. The `show
functions` logic is not addressed in this change. Can add some unit
tests for show functions as well

Tests:
Added unit tests that enable to flag for this feature, and it is
overriding the SQL function implementation properly.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
Summary:
Fix to support GCC14 build

- Replace `{}` with explicit empty container to avoid the following error within optionals.

          error: converting to 'std::in_place_t' from list would use explicit contructor
     `{}` leads to copy initialization which is not allowed since in_place_t is marked explicit

- Add Import `chrono` in `Duration.h` as gcc14 mandates having it

- Correct include directory path for proxygen

- Ignore errors associated with template-id-cdtor as gcc14 fails build for constructors having template support

Rollback Plan:


```
== NO RELEASE NOTE ==
```


Differential Revision: D80784416

Pulled By: pratikpugalia
Presto-main was split into presto-main and presto-main-base. Update
paths in codeowners file to reflect the change.
…stodb#25750)

Summary:
I added threshold for logging memory pool allocations":
facebookincubator/velox#14437
In this adding I'm adding corresponding session property to configure
the threshold.

Differential Revision: D80066283
Co-authored-by: Christian Zentgraf <czentgr@us.ibm.com>
## Description

This PR is [the fix from branch release-0.294](
prestodb#25900), to fix maven release
issues

## Motivation and Context
Merge the fix from release branch into master branch

## Impact
Newer releases

## Test Plan
Tested with release 0.294

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes

```
== NO RELEASE NOTE ==
```
…5357

Fix prestodb#25357
Added type mapping table for Delta Lake to PrestoDB

Co-Authored-By: Steve Burnett <burnett@pobox.com>
Co-Authored-By: Jalpreet Singh Nanda <jalpreetnanda@gmail.com>
Summary:

Adds output row stats for sapphire-velox related sink operators
Properly close write file on broadcast write

Reviewed By: singcha

Differential Revision: D81271224
hantangwangd and others added 13 commits September 4, 2025 00:05
Co-authored-by: Ashish Tadose <ashishtadose@gmail.com>
Co-authored-by: Ariel Weisberg <aweisberg@fb.com>
Co-authored-by: Jalpreet Singh Nanda (:imjalpreet) <jalpreetnanda@gmail.com>
Currently, the presto-session-property-managers module has gotten large due to the different implementations for file based
and db based session property managers. If implemented, this commit will split the module into 3,
``presto-session-property-managers-common`` will contain the API implementation for a session property manager
along with some tests. ``presto-file-session-property-manager`` will contain the file based implementation, and
``presto-db-session-property-manager`` will contain the db based implementation.
Summary:

In this diff I update the taskWriterCount and taskPartitionedWriterCount value to system config which was deleted in D80124169 diff.
kTaskWriterCount is important config for the Impulse connector created table to increase the parallism to ingestion the data by creating multiple writer drivers.

Differential Revision: D81522582
…#25943)

## Description
This PR implements native catalog properties support for Presto Spark,
enabling proper configuration and management of catalog properties for
native execution processes.

**Key Changes:**

* **Added `NativeExecutionCatalogProperties` class**: A new class that
holds catalog properties for native execution processes, where each
catalog generates a separate `<catalog-name>.properties` file.
    
* **Enhanced `WorkerProperty` and `PrestoSparkWorkerProperty`**:
Extended the worker property classes to support catalog properties
configuration and proper property file generation.
    
* **Updated Native Execution Module**: Modified `NativeExecutionModule`
and `NativeExecutionProcess` to integrate catalog properties into the
native execution workflow.
    
* **Improved Configuration Integration**: Updated `PrestoSparkModule`,
`PrestoSparkServiceFactory` to properly wire the new catalog properties
functionality.
    
* **Enhanced Test Coverage**: Added and updated tests in
`TestNativeExecutionSystemConfig`, `TestNativeExecutionProcess`, and
other test classes to ensure proper catalog properties handling.

## Motivation and Context
This change is required to support proper catalog configuration in
Presto Spark's native execution mode. Previously, catalog properties
were not properly managed for native execution processes, which limited
the ability to configure connectors effectively in native mode.

## Test Plan
*   **Unit Tests**: Added test `NativeExecutionCatalogProperties`
* **Integration Tests**: Updated `TestNativeExecutionProcess` and
`TestPrestoSparkHttpClient` to verify catalog properties integration

## Contributor checklist

- [x] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [x] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [x] Adequate tests were added if applicable.
- [x] CI passed.

## Release Notes
```
== NO RELEASE NOTE ==
```
…5878)

## Description
Fixes a small typo on aws dependency install command.

## Motivation and Context
<!---Why is this change required? What problem does it solve?-->
<!---If it fixes an open issue, please link to the issue here.-->

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
…esto-spark code in presto-native-execution module
…tercept functions (prestodb#25475) (prestodb#25748)

Summary:

## Context
Currently we don't enforce intermediate/return type are the same in
Coordinator and Prestissimo Worker.
Velox creates vectors for intermediate/return results based on a plan
that comes from Coordinator. Then Prestissimo tries to use those vector
and not crash.

In practise we had a crash some time ago due to such a mismatch
(D74199165).
And I added validation to Velox to catch such kind of mismatches early:
facebookincubator/velox#13322
But we wasn't able to enable it in prod, because the validation failed
for "regr_slope" and "regr_intercept" functions.

## What's changed?
In this diff I'm fixing "regr_slope" and "regr_intercept" intermediate
types. Basically in Java `AggregationState` for all these functions is
the same:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
    AggregationFunction("regr_sxy")
    AggregationFunction("regr_sxx")
    AggregationFunction("regr_syy")
    AggregationFunction("regr_r2")
    AggregationFunction("regr_count")
    AggregationFunction("regr_avgy")
    AggregationFunction("regr_avgx")
```
But in Prestissimo the state storage is more optimal:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
```
These 2 aggregation functions don't have M2Y field. And this is more
efficient, because we don't waste memory and CPU on the field, that
aren't needed.

So I moved M2Y to extended class, the same as it works in Velox:
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/CovarianceAggregates.cpp?fbclid=IwY2xjawLRTetleHRuA2FlbQIxMQBicmlkETFiT0N3UFR0M2VKOHl6MHRhAR6KRQ1VUQdCkZXzwj14sMQrVZ-R9QBH1utuGJb5U_lyGzDwt8PwV317QRVNJg_aem_-ePxZ-fHO5MNgfUmayVJFA#L326-L337

No major changes, mostly just reorganized the code.

## Test plan
I tested `REGR_SLOPE`, `REGR_INTERCEPT` and `REGR_R2` functions since
they are heavily used in prod and cover both cases: with and without M2Y
field.

What my test looked like. For all 3 `REGR_*` functions I found some prod
queries, then:
1. Ran them on prev Java build
2. Ran them on new (with this PR) Java build
3. Ran them on prev Prestissimo build
4. Ran them on new (with this PR) Prestissimo build

And compared the output results. They all were identical.
With this manual test we covered `Coordinator -> Java Worker` and
`Coordinator -> Prestissimo Worker` integrations.

## Next steps
In this diff I'm trying to apply the same optimization to Java. With
this fix, the signatures will become the same in Java and Prestissimo
and we will be able to enable the validation

Differential Revision: D77625566

== NO RELEASE NOTES ==
HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(conf).setBasePath(table.getPath()).build();
HoodieTimeline timeline = metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants();
String timestamp = timeline.lastInstant().map(HoodieInstant::getTimestamp).orElse(null);
String timestamp = timeline.lastInstant().map(HoodieInstant::requestedTime).orElse(null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sth to audit to make sure that for table version 8 and above the latest instant based on the completion time should be returned.

Copy link
Collaborator

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

xin-zhang2 and others added 13 commits September 4, 2025 20:55
Co-Authored-By: Yuya Ebihara <ebyhry@gmail.com>
Co-Authored-By: Pádraig O'Sullivan <osullivan.padraig@gmail.com>
…ecar enabled clusters

Adds a new plugin : presto-native-sql-invoked-functions-plugin that contains all inlined SQL functions except those with overridden native
implementations. This plugin is intended to be loaded only in sidecar enabled clusters.
Changes adapted from trino/PR#11336, 12910
Original commits:
9e8d51ad45f57267f5f7fa6bf8e8c4ec56103dda
f0508a7ab420449c6e2960ecf1d0a8d7058242da
Author: kasiafi

Modifications were made to adapt to Presto including:
Removal of Node Location from TrinoException
Added new SemanticErrorCodes
Changed Void context to SqlPlannerContext in RelationPlanner.java
Add newUnqualified to Field class with Presto specification
Add getCanonicalValue to Identifier.java

Co-authored-by: Pratik Joseph Dabre <pdabre12@gmail.com>
Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Co-authored-by: Xin Zhang <desertsxin@gmail.com>
## Description
Update 0.294 release notes about executable jars, related to PR
prestodb/prestodb.github.io#303

## Motivation and Context
prestodb/prestodb.github.io#303

## Impact
0.294 release notes

## Test Plan
N/A

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Documentation:
- Add details on building and running executable JARs in the 0.294
release notes

---------

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: Steve Burnett <burnett@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.