-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Query iterator method #80
Added Query iterator method #80
Conversation
Any updates on this? |
@BlackGad will get to it soon, hang tight. |
so? any progress so far? |
@BlackGad apologies, I'll try to set aside a bit of time next week for this SDK. |
so?) |
I understand that this is just a side project for you @roji, but could you consider approving the request? This missing functionality is crucial for us to dump data into our internal cold storage, and it's quite cumbersome to maintain an internal fork. Additionally, I believe the community would benefit from having this helpful functionality in C#. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not having had time to look at this earlier - but sometimes it can take a while as other things take priority.
I know this was already merged, but I can't say I'm very happy about it given that the PR has zero tests for the new API. I'd be against releasing a new version in this state.
Also see below for some comments which I'd like to see us fix before releasing.
Finally, can you please point to the corresponding Python code which I assume this is based on?
object? pkLastValue; | ||
int processedDuringIterationCount; | ||
var pkFieldsData = response.FieldsData.Single(x => x.FieldId == pkField.FieldID); | ||
if (pkField.DataType == DataType.VarChar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a switch here, with a default case that throws, in case Milvus add support for a 3rd PK type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is direct port of logic from python client
But ok, will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't think it makes sense to keep code a certain way just because the python implementation looks a certain way. This is a different project in a different language, and we should be doing things in the best way possible here regardless of what the Python driver looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
new Grpc.KeyValuePair {Key = Constants.Iterator, Value = "True"}, | ||
new Grpc.KeyValuePair {Key = Constants.ReduceStopForBest, Value = "True"}, | ||
new Grpc.KeyValuePair {Key = Constants.BatchSize, Value = batchSize.ToString(CultureInfo.InvariantCulture)}, | ||
new Grpc.KeyValuePair {Key = Constants.Offset, Value = 0.ToString(CultureInfo.InvariantCulture)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not just "0"
? Are you sure we even need to set Offset at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic port here from python client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line you refer to doesn't seem to have anything to do with this - it just checks that offset is zero; whereas here I'm asking why not just "0"
as simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh do you mean use constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
new Grpc.KeyValuePair {Key = Constants.Limit, Value = Math.Min(batchSize, userLimit).ToString(CultureInfo.InvariantCulture)}); | ||
|
||
var processedItemsCount = 0; | ||
while (!cancellationToken.IsCancellationRequested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cancellation token should be checked internally in InvokeAsync which is called just below - is there any reason for the additional check here? If not, it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because then it will be while(true) so it is defensive strategy so I will be sure that loop will broke.
I prefer to leave this. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the check doesn't ensure that the loop will be broken any more than a simple while (true)
(i.e. the cancellation token mat never get triggered). The thing that guarantees that the loop will break is the yield break
inside, when there are no more results. And since the cancellation token is checked within InvokeAsync anyway, I think this should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
QueryParameters? parameters = null, | ||
[EnumeratorCancellation] CancellationToken cancellationToken = default) | ||
{ | ||
if ((parameters?.Offset ?? 0) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason not to support Offset? It could simply be set on the very first call, and from that point on just continue with the batching as usual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same python logic port. From my tests milvus behave strange when you try to setup offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to understand better if this is an actual Milvus limitation or just an unimplemented feature at the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think clients should behave the same way. I don't have time to look into it right now, but I saw some strange results when I changed the offset to a non-zero value. I need this code merged into the master branch so I can use it and not rely on a fork. Maybe we should delay this proof of concept to future releases for those who really need it?
new Grpc.KeyValuePair {Key = Constants.ReduceStopForBest, Value = "True"}, | ||
new Grpc.KeyValuePair {Key = Constants.BatchSize, Value = batchSize.ToString(CultureInfo.InvariantCulture)}, | ||
new Grpc.KeyValuePair {Key = Constants.Offset, Value = 0.ToString(CultureInfo.InvariantCulture)}, | ||
new Grpc.KeyValuePair {Key = Constants.Limit, Value = Math.Min(batchSize, userLimit).ToString(CultureInfo.InvariantCulture)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about Limit vs. BatchSize here... If Limit is bigger than BatchSize, will have the same value (because of the Min here). If Limit is smaller than BatchSize, BatchSize isn't reduced... What's the logic here supposed to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit and batch size must be equal here. Will fix
// If user expression is not null, we should use it | ||
{userExpression: not null} => userExpression, | ||
// If user expression is null and pk field is string | ||
{pkField.DataType: DataType.VarChar} => $"{pkField.Name} != ''", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this non-empty string check - and the below < {long.MaxValue}
check needed? Why not just leave Expr empty if the user didn't specify an expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BlackGad what about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to guaranty order on first request when user expression is absent. Again it is port from python :)
|
Added Query iterator implementation that behave in same manner as python Query operator.
It would be pleasure to see this PR merged and released.