-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(native): Add dummy signature for cosine_similarity on Array(Real) #26555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideExpose a dummy Java UDF signature for cosine_similarity on Array(Real) to surface the existing C++ implementation and avoid expensive type casts, while renaming the original Array(Double) method to prevent conflicts. Class diagram for updated MathFunctions cosine_similarity signaturesclassDiagram
class MathFunctions {
+Double arrayCosineSimilarityDouble(Block leftArray, Block rightArray)
+Long arrayCosineSimilarityReal(Block leftArray, Block rightArray)
}
MathFunctions : <<static>>
MathFunctions : arrayCosineSimilarityDouble() @ScalarFunction("cosine_similarity") @SqlType(StandardTypes.DOUBLE)
MathFunctions : arrayCosineSimilarityReal() @ScalarFunction("cosine_similarity") @SqlType(StandardTypes.REAL)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skyelves
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
|
BTW, looks like adding the full implementation is not too hard |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gggrace14 : Thanks for this code.
This is very hacky though. You have 2 options:
i) Implement the Java function
ii) Use native side-car to get the function signatures. Believe Meta team already uses the side-car to some degree for this. @kevintang2022 @amitkdutta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gggrace14 In Presto clusters at meta side car is enabled. We see
cosine_similarity | real | array(real), array(real) | scalar | true | presto.default.cosine_similarity >
presto:di> select cosine_similarity(array[cast(1.0 as real)], array[cast(2.3 as real)]);
_col0
-------
1.0
(1 row)
Query 20251107_054901_08862_syaxz, FINISHED, 1 node
Splits: 0 total, 0 done (0.00%)
[Latency: client-side: 141ms, server-side: 44ms] [0 rows, 0B] [0 rows/s, 0B/s]
As @aditi-pandit mentioned, its not required.
|
@amitkdutta @kevintang2022 In prod today, the planner at coordinator looks not able to recognize the array(real) signature of cosine_similarity, which is only implemented in the native worker. So we can see the plan casting array(real) to array(double), as the Explain below shows. The Java coordinator has only the array(double) signature With this change, the plan doesn't do the cast anymore. Do you want to check and/or fix? |
|
The cast is expensive for both CPU and memory for large queries, according to our profiling. To get us unblocked, I will add the Java impl for this array(real) signature. |
Description
Add dummy signature for cosine_similarity() on Array(Real).
Motivation and Context
The implementation of cosine_similarity() for Array(Real) input is already in place, but behind the tag VELOX_ENABLE_FAISS. This signature with Array(Real) is not recognized by the planner. As a result, when the input is of Array(Real) type, the planner adds a CAST to Array(Double), which is very expensive for large input.
Impact
Expose the cosine_similarity signature for Array(Real).
Test Plan
Run cosine_similarity() with Array(Real) input column.