Skip to content

Conversation

@gggrace14
Copy link
Contributor

Description

Add dummy signature for cosine_similarity() on Array(Real).

Motivation and Context

The implementation of cosine_similarity() for Array(Real) input is already in place, but behind the tag VELOX_ENABLE_FAISS. This signature with Array(Real) is not recognized by the planner. As a result, when the input is of Array(Real) type, the planner adds a CAST to Array(Double), which is very expensive for large input.

Impact

Expose the cosine_similarity signature for Array(Real).

Test Plan

Run cosine_similarity() with Array(Real) input column.

== NO RELEASE NOTE ==

@gggrace14 gggrace14 requested a review from a team as a code owner November 7, 2025 00:51
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Nov 7, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 7, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Expose a dummy Java UDF signature for cosine_similarity on Array(Real) to surface the existing C++ implementation and avoid expensive type casts, while renaming the original Array(Double) method to prevent conflicts.

Class diagram for updated MathFunctions cosine_similarity signatures

classDiagram
    class MathFunctions {
        +Double arrayCosineSimilarityDouble(Block leftArray, Block rightArray)
        +Long arrayCosineSimilarityReal(Block leftArray, Block rightArray)
    }
    MathFunctions : <<static>>
    MathFunctions : arrayCosineSimilarityDouble() @ScalarFunction("cosine_similarity") @SqlType(StandardTypes.DOUBLE)
    MathFunctions : arrayCosineSimilarityReal() @ScalarFunction("cosine_similarity") @SqlType(StandardTypes.REAL)
Loading

File-Level Changes

Change Details Files
Rename existing cosine_similarity method for Array(Double) to avoid signature collision
  • Change method name from arrayCosineSimilarity to arrayCosineSimilarityDouble
  • Retain SqlType(StandardTypes.DOUBLE) and existing implementation logic
presto-main-base/src/main/java/com/facebook/presto/operator/scalar/MathFunctions.java
Add dummy signature for cosine_similarity on Array(Real)
  • Introduce arrayCosineSimilarityReal returning Long and annotated with REAL SqlType
  • Register with @ScalarFunction("cosine_similarity") and add @description
  • Throw PrestoException to defer to C++ implementation in Prestissimo
presto-main-base/src/main/java/com/facebook/presto/operator/scalar/MathFunctions.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@skyelves skyelves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@skyelves
Copy link
Contributor

skyelves commented Nov 7, 2025

BTW, looks like adding the full implementation is not too hard

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 : Thanks for this code.

This is very hacky though. You have 2 options:
i) Implement the Java function
ii) Use native side-car to get the function signatures. Believe Meta team already uses the side-car to some degree for this. @kevintang2022 @amitkdutta

Copy link
Contributor

@amitkdutta amitkdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 In Presto clusters at meta side car is enabled. We see

 cosine_similarity       | real        | array(real), array(real)                 | scalar        | true          | presto.default.cosine_similarity                        >
presto:di> select cosine_similarity(array[cast(1.0 as real)], array[cast(2.3 as real)]);
 _col0                
-------
   1.0 
(1 row)

Query 20251107_054901_08862_syaxz, FINISHED, 1 node
Splits: 0 total, 0 done (0.00%)
[Latency: client-side: 141ms, server-side: 44ms] [0 rows, 0B] [0 rows/s, 0B/s]

As @aditi-pandit mentioned, its not required.

@gggrace14
Copy link
Contributor Author

gggrace14 commented Nov 7, 2025

@amitkdutta @kevintang2022 In prod today, the planner at coordinator looks not able to recognize the array(real) signature of cosine_similarity, which is only implemented in the native worker. So we can see the plan casting array(real) to array(double), as the Explain below shows. The Java coordinator has only the array(double) signature

presto:di> EXPLAIN (TYPE DISTRIBUTED) SELECT COSINE_SIMILARITY(c0, c0) FROM ( VALUES(ARRAY[REAL'1.0', REAL'2.0']) )t(c0);

         - Project[PlanNodeId 4][projectLocality = LOCAL] => [cosine_similarity:double]                                    
                 Estimates: {source: CostBasedSourceInfo, rows: 1 (9B), cpu: 60.00, memory: 0.00, network: 0.00}           
                 cosine_similarity := cosine_similarity(CAST(field AS array(double)), CAST(field AS array(double))) (1:69) 

With this change, the plan doesn't do the cast anymore.

         - Project[PlanNodeId 4][projectLocality = LOCAL] => [cosine_similarity:real]                                   
                 Estimates: {source: CostBasedSourceInfo, rows: 1 (5B), cpu: 56.00, memory: 0.00, network: 0.00}        
                 cosine_similarity := cosine_similarity(field, field) (1:69)  

Do you want to check and/or fix?

@gggrace14
Copy link
Contributor Author

gggrace14 commented Nov 8, 2025

The cast is expensive for both CPU and memory for large queries, according to our profiling. To get us unblocked, I will add the Java impl for this array(real) signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants