-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to pass customField to /rerank endpoint #303
base: rc/2024-10
Are you sure you want to change the base?
Conversation
This reverts commit d3b7f0e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, I did leave some question and comments about a few things. I feel like I haven't read much TypeScript in a while and am a bit rusty. 😅
src/inference/inference.ts
Outdated
} else { | ||
throw new PineconeArgumentError( | ||
'Documents must be a list of strings or a list of objects' | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for it to be a mix of the two? I'm actually not sure, but I'm assuming we don't allow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's a good q! Right now, the TS client doesn't account for that. I'll add it in, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if someone mixed strings and documents in the same request? That would be kinda odd, I don't think we need to bend over backwards to support that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay cool. I think my final else
stmt would actually account for this anyways, no?
The logic is:
- if docs is an array of strings, do something
- else if docs is an array of objs, do something else
- else, error out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the logic should be fine for that atm.
src/inference/inference.ts
Outdated
if (documents.every((doc) => typeof doc === 'string')) { | ||
// Set rankFields to default value of 'text' | ||
computedRankFields = ['text']; | ||
} else if ( | ||
// If`documents` is an array of objects: | ||
documents.every((doc) => typeof doc === 'object' && doc !== null) | ||
) { | ||
// If there is a `text` key (other keys can be present too): | ||
if ( | ||
documents.every((doc) => (doc as { [key: string]: string })['text']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there maybe a more efficient way to parse the documents
that are passed and make some determinations about the contents one time up front rather than calling documents.every
so many times? In the last code path we'd be iterating over everything in the array 3 times which is a bit heavy.
I'd try maybe looping over it once and building some parameters/bools that we use in these if statements rather than iterating repeatedly.
src/inference/inference.ts
Outdated
computedRankFields = ['text']; | ||
} else if ( | ||
// If`documents` is an array of objects: | ||
documents.every((doc) => typeof doc === 'object' && doc !== null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check doc !== null
here but not with the string
case above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added above!
src/inference/inference.ts
Outdated
} | ||
// Standardize documents to ensure they are in object format | ||
const newDocuments = documents.map((doc) => | ||
typeof doc === 'string' ? { text: doc } : doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the text
key here be different if they passed a custom rankFields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a valid scenario, here's my thinking:
- If someone passes in an array of strings, there are no keys in their documents
- If there are no keys in their documents, on the backend we artificially transform their documents into an array of key:value pairs, with a
text
key, simply b/c that's what the model needs for reranking - If someone passes in an array of strings and then also passes in a custom field to rerank on... their docs would inherently be missing that custom field (by way of not being key:value pairs), so it would error out
Thoughts @ssmith-pc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fully following the original question since the impl has changed, but:
- If there are no keys in their documents, on the backend we artificially transform their documents into an array of key:value pairs, with a text key, simply b/c
By "on the backend we transform" I think you mean, it's transformed in the client right? This is the right behavior
- If someone passes in an array of strings and then also passes in a custom field to rerank on... their docs would inherently be missing that custom field (by way of not being key:value pairs), so it would error out
Correct, but it's fine if that error is detected in the server and sent back to the client. The client doesn't have to try and detect that.
const params = { inputType: 'text', truncate: 'END' }; | ||
const expected = { inputType: 'text', truncate: 'END' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test if we're not expecting this to change anything? Maybe I'm unclear on the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird... this must be a holdover from something... I'll remove method (it does nothing currently) + this test. thanks!
{ text: 'hi', customField: 'doc1' }, | ||
{ text: 'bye', customField: 'doc2' }, | ||
]; | ||
expect(pinecone.inference.rerank(model, query, myDocuments)).toBeDefined(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little rusty on ts at this point, if the call fails it would avoid the expect and fail the test, right? Just trying to understand what this is asserting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man, extremely good catch! I'll change to assert that rerankUnits are >= 1
const expectedError = new PineconeArgumentError( | ||
'You must pass at least one document to rerank' | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Personal preference, but I'd maybe just leave the expected error inline on the expect().toEqual()
. Feels easier to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, @aulorbe, but can you add an example to the README?
Thank you! I think the existing example in the README on the RC branch is still valid, right? Or were you think I should add more to that one? |
Yeah, I think it's still valid. But perhaps we should add rankFields to the second example. Also, I'm not sure if it's worth NOT setting the other rank options in the first example (top_n, return documents, etc.). I'd suggest the first example has everything but rankfields, and the second example shows how to use rankfields to rank on a custom field. |
Cool, will do! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README LGTM, with a nit
Co-authored-by: Jesse Seldess <j_seldess@hotmail.com>
src/inference/inference.ts
Outdated
* should to rank the documents, and additional model-specific parameters. See {@link RerankOptions} for more details. | ||
* @param documents - (Required) An array of documents to rerank. The array can either be an array of strings or | ||
* an array of objects. | ||
* @param options - (Optional) Additional options to send with the reranking request. See {@link RerankOptions} for more details. | ||
* */ | ||
@prerelease('2024-10') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove this pre-release tag now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
src/inference/inference.ts
Outdated
}); | ||
}); | ||
if (!options.rankFields) { | ||
// If each obj in newDocs does not have a 'text' field, throw an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the comment here should probably read "if any obj does not have... "
But may be more clear as simply:
"Ensure each obj in newDocs has a 'text' field"
src/inference/inference.ts
Outdated
if ( | ||
!newDocuments.every((doc) => typeof doc === 'object' && 'text' in doc) | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server will validate that the documents contain the fields they're supposed to, so it's not entirely necessary to validate on the client. However, if we want to validate on the client, why not also validate the case where a custom rank field was provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah good call, I remove this part actually (the validation part)
src/inference/inference.ts
Outdated
parameters = {}, | ||
rankFields = undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to destructure rankFields and set to undefined? Isn't it undefined by default? I think I don't entirely understand why we can't keep the rankFields = ['text']
default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good call, good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for iterating and helping me iterate in return! 🚢
const { | ||
topN = documents.length, | ||
returnDocuments = true, | ||
rankFields = ['text'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be left in, right? If they haven't provided a rankFields
array explicitly, we should just default to "text"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Problem
The current implementation of
/rerank
in the TS client does not (correctly) allow users to pass a custom field upon which to rerank.Solution
Allow custom fields!
Please reference this PR to account for all expected functionality: https://github.com/pinecone-io/python-plugin-inference/pull/21/files
Type of Change
Test Plan
CI passes + reviewer xreferences PR above w/functionality introduced in this PR.