Replies: 8 comments
-
Hi @akoumjian . Thanks for the kind words and for trying out the plugin. Very interesting use-case! I've only thought about it a couple minutes, but here's one idea that might work with the current functionality: Have a separate field per vector. So your docs would look like this:
Use the dynamic mapping feature to specify that any field called Of course that's not ideal because you have to keep track of your Accepting a asingle vector or an array of vectors is interesting. I think I could make it work just in terms of pure mechanics. I'd have to think harder about how scoring should behave when you have > 1 vector. Should it return the highest score, average score, be configurable by a parameter? I won't get to work on this for at least a couple more weeks. But I appreciate any more details you might have about how you'd expect it to behave. If you or anyone in your org want to take a pass at it, LMK and I'll write up a getting-started guide for devs. I've been meaning to do that anyways. |
Beta Was this translation helpful? Give feedback.
-
Thank you for taking the time to think it over, and such a quick response. Our current work around was precisely your suggestion, but I hadn't been clever enough to think of using dynamic mapping combined with matching on the field name to support the type. Instead we had decided to hard code a limited number of sentence vectors ( As for the scoring, I believe scoring is cumulative when you have multiple matches, but I can't recall and preliminary searches in the documentation are not helping. I'm not sure yet if we would be available to take a full try at it, but a getting started guide for devs would be appreciated for sure. |
Beta Was this translation helpful? Give feedback.
-
Good to hear this at least partially works. I won't have time to work on it for at least another week. I'll keep the issue open though. I think it would be a neat feature.
Cumulative would be tricky if you can have different numbers of vectors. What happens when one doc has 100 and another doc has 2. Not really a fair comparison.
I just added a developer guide here: https://github.com/alexklibisz/elastiknn/blob/master/developer-guide.md Feel free to ask questions on gitter or email me aklibisz@gmail.com if you or someone on your team ends up digging into it. I'm sure there are things I haven't documented thoroughly. |
Beta Was this translation helpful? Give feedback.
-
Another interesting use-case for this is when you might have multiple images, and thus multiple image vectors, for a single product |
Beta Was this translation helpful? Give feedback.
-
Hey, @alexklibisz ! We have a similar need in storing paragraph level embeddings for document. One option to calculate the score would be to take the average of max value and the mean of top X matches (since max value and mean value both matter). Can you tell me if any updates on this feature happened? Would you have some examples of writing custom scoring for this case? |
Beta Was this translation helpful? Give feedback.
-
Hi Luiza, no - no progress on this feature. Also no examples. I don't have this on my roadmap for the project right now, but would be happy to review and provide guidance on PRs. There's a developer guide in the repo. |
Beta Was this translation helpful? Give feedback.
-
Thank you, if we decide to go with this approach later using elastiknn, I will share updates here. |
Beta Was this translation helpful? Give feedback.
-
I don't plan on implementing this. Will happily review if someone else takes a pass at it. |
Beta Was this translation helpful? Give feedback.
-
I don't know how difficult this would be, since I haven't investigated the elastiknn field implementation. Most fields in Elasticsearch can accept an array of values that you can match and sort against, and we could use the same thing for the dense vector field. Here is our use case:
We have long form descriptions of objects we want to index at the sentence-vector level. Each object description therefore has multiple vectors associated with it. When I perform a knn query, I want to get back a document if any of those sentence-vectors is close.
We considered having a separate document for every sentence-vector along with the object id, but this creates a lot of challenges if we want to filter against other object properties.
Thank you for the truly excellent software!
Beta Was this translation helpful? Give feedback.
All reactions