-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client Side Embeddings Search #119
Comments
@sshivaditya2019 rfc on time estimate and spec |
This can be accomplished using natural library1, which is highly optimized. The main challenge would be generating the 1024-size embeddings on the client side. Rather than retrieving embeddings from the database, we could use the wink-js embedding model2 to generate embeddings for both the query and the entries. These embeddings could be computed at load time, potentially increasing page load time by 15 to 35 seconds or more in some cases, and then used in the search process. Vector-based search may not be particularly beneficial here; instead, heuristic-based retrieval methods, such as NDCG, along with a more effective search algorithm, would likely yield better results. Footnotes |
Lets do your recommendation |
I think it will take around a day to set up the heuristic-based search functionality. I'm not sure if there's a gating mechanism for tasks or something similar, but I can incorporate that into this task to create an integrated task recommender, if that's an requirement. @0x4007 rfc |
This is not implemented anywhere now. However it will soon be implemented based on contributor/collaborator status and priority level (or time level) But that will only be on GitHub and not our UI I think. We still need to figure that out.
Integrated task recommender sounds very cool on the UI level. I'm onboard with exploring this although as of right now implementation details are not clear to me. |
/start |
@0x4007 could you assign this issue to me ? |
@sshivaditya2019 the deadline is at Sun, Oct 27, 5:30 PM UTC |
@gentlementlegen Not working again Start officially is our most unreliable plugin |
Error was { "message": "Validation Failed", "errors": [ { "message": "The listed users cannot be searched either because the users do not exist or you do not have permission to view the users.", "resource": "Search", "field": "q", "code": "invalid" } ], "documentation_url": "https://docs.github.com/v3/search/", "status": "422" } with the search arguments like { "q": "org:ubiquity author:sshivaditya2019 state:open", "per_page": 100, "order": "desc", "sort": "created" } URL for reference |
Okay you should figure the root problem and fix |
I've mentioned this before re: user privacy settings affecting our attempts via GQL and rest but the root problem is shivs account' privacy settings being restricted which we don't control unfortunately. So perhaps we should just assume defaults in this situation and apply the lowest contributor limits and then use an alt search query for PRs/Issue in the network and then filter using their username as they would be public as that's our org settings then. I assume it's the assigned issues query that's caused it here. |
If it's something the contributor can fix then the solution is to write a detailed error explaining that they can't self assign until they fix their settings, explain exactly what to fix, and then provide a link to where they can fix. |
It is still weird to me that the user privacy affects a search because the profile is public. Can we consider using GQL with issues search instead of the search API? Something like query($organization: String!, $author: String!) {
organization(login: $organization) {
repositories(first: 100) {
nodes {
issues(first: 100, states: OPEN, filterBy: {createdBy: $author}) {
nodes {
title
url
createdAt
}
}
}
}
}
} with {
"organization": "ubiquity",
"author": "sshivaditya2019"
} would achieve the same result. I don't know if that would resolve the issue but it's worth a try. |
You can test and verify pretty quickly. I suggest you do that and let us know. |
using the explorer and my login for access to the explorer {
"data": {
"organization": null
},
"errors": [
{
"type": "FORBIDDEN",
"path": [
"organization",
"repositories"
],
"extensions": {
"saml_failure": false
},
"locations": [
{
"line": 3,
"column": 5
}
],
"message": "Although you appear to have the correct authorization credentials, the `ubiquity` organization has enabled OAuth App access restrictions, meaning that data access to third-parties is limited. For more information on these restrictions, including how to enable this app, visit https://docs.github.com/articles/restricting-access-to-your-organization-s-data/" |
If OAuth app access is required to read user data, let's use the app for logging in on devpool.directory. The error can explain that the user needs to sign in on devpool.directory if there is a problem reading their data. |
It'd be better to just test locally, sorry didn't have time to do so today. |
Aye it likely would be sorrry bud |
|
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Task | 1 | 400 |
Issue | Comment | 3 | 17.0465 |
Review | Comment | 17 | 0 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
This can be accomplished using natural library[^01^], which is h… | 15.48content: content: p: score: 0 elementCount: 3 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 111 wordValue: 0.1 result: 5.48 | 0.85 | 14.658 |
I think it will take around a day to set up the heuristic-based … | 2.87content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 52 wordValue: 0.1 result: 2.87 | 0.75 | 2.1525 |
@0x4007 could you assign this issue to me ? | 0.59content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 8 wordValue: 0.1 result: 0.59 | 0.4 | 0.236 |
Resolves #119 - Adds a new search system, using a fuzzy matchi… | 1.5content: content: p: score: 0 elementCount: 2 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 1 result: 1.5 regex: wordCount: 22 wordValue: 0 result: 0 | 0.7 | 0 |
Fixed that. | 0.36content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 2 wordValue: 0.2 result: 0.36 | 0.2 | 0 |
Isn't `filterIssues` an arrow function? I don't think it… | 3.5content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0.2 result: 3.5 | 0.5 | 0 |
The IssueSearch already contains all of the issues; this only re… | 4.5content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 39 wordValue: 0.2 result: 4.5 | 0.8 | 0 |
Moved the`filterIssues` to a separate file. | 0.92content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 6 wordValue: 0.2 result: 0.92 | 0.6 | 0 |
A good example for testing would be "Large language models", ori… | 5.84content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 53 wordValue: 0.2 result: 5.84 | 0.4 | 0 |
Fuzzy search only works with "?" prefixed searches; otherwise, h… | 11.03content: content: p: score: 0 elementCount: 6 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 result: 3 regex: wordCount: 77 wordValue: 0.2 result: 8.03 | 0.9 | 0 |
The results are not sorted by score, as that would conflict with… | 3.91content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 33 wordValue: 0.2 result: 3.91 | 0.3 | 0 |
The results are sorted by default by their relevance score. If t… | 4.6content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 40 wordValue: 0.2 result: 4.6 | 0.7 | 0 |
@0x4007 @zugdev Can you take a look at this pull? I think it's r… | 3.4content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 28 wordValue: 0.2 result: 3.4 | 0.2 | 0 |
I’m not sure if you’ve left a review, but I don’t see any commen… | 2.55content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.2 result: 2.55 | 0.1 | 0 |
That should be fixed. I'm not sure why this is happening, but I … | 4.11content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 35 wordValue: 0.2 result: 4.11 | 0.3 | 0 |
I am not sure, if this is usual/related to this pull, but I am n… | 4.01content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 34 wordValue: 0.2 result: 4.01 | 0.2 | 0 |
Are you sure it was this commit ? It doesn't seem like it made a… | 4.11content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 35 wordValue: 0.2 result: 4.11 | 0.1 | 0 |
@0x4007 Is there anything still pending for this PR? I can make … | 2.22content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 17 wordValue: 0.2 result: 2.22 | 0.2 | 0 |
It should be fixed in `94533de`. I changed the way the r… | 1.77content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 13 wordValue: 0.2 result: 1.77 | 0.4 | 0 |
This has been fixed in `aae1a02`. I have rewritten the &… | 2.98content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 24 wordValue: 0.2 result: 2.98 | 0.5 | 0 |
[ 43.408 UUSD ]
@0x4007
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Specification | 1 | 16.98 |
Issue | Comment | 8 | 17.173 |
Review | Comment | 3 | 9.255 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
<img width="699" alt="image" src="https://github.com/user-att… | 5.66content: content: p: score: 0 elementCount: 5 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 69 wordValue: 0.1 result: 3.66 | 1 | 16.98 |
@sshivaditya2019 rfc on time estimate and spec | 1.05content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 7 wordValue: 0.2 result: 1.05 | 0.7 | 0.735 |
Lets do your recommendation | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 4 wordValue: 0.2 result: 0.65 | 0.6 | 0.39 |
This is not implemented anywhere now. However it will soon be im… | 7.49content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 71 wordValue: 0.2 result: 7.49 | 0.8 | 5.992 |
@gentlementlegen Not working againStart officially is our most … | 1.54content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 11 wordValue: 0.2 result: 1.54 | 0.5 | 0.77 |
Okay you should figure the root problem and fix | 1.29content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.2 result: 1.29 | 0.7 | 0.903 |
If it's something the contributor can fix then the solution is t… | 4.99content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 44 wordValue: 0.2 result: 4.99 | 0.6 | 2.994 |
You can test and verify pretty quickly. I suggest you do that an… | 2.11content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 16 wordValue: 0.2 result: 2.11 | 0.7 | 1.477 |
If OAuth app access is required to read user data, let's use the… | 4.89content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 43 wordValue: 0.2 result: 4.89 | 0.8 | 3.912 |
What's some good search terms to test?I know before was exact s… | 2.49content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 44 wordValue: 0.1 result: 2.49 | 0.7 | 1.743 |
I think I would feel more comfortable if we can enable fuzzy sea… | 3.02content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 55 wordValue: 0.1 result: 3.02 | 0.8 | 2.416 |
![image](https://github.com/user-attachments/assets/1a63b3fd-044… | 5.32content: content: p: score: 0 elementCount: 1 img: score: 5 elementCount: 1 result: 5 regex: wordCount: 4 wordValue: 0.1 result: 0.32 | 0.3 | 5.096 |
[ 4.474 UUSD ]
@gentlementlegen
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Comment | 3 | 4.474 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
Error was```json{ "message": "Validation Failed"… | 1.75content: content: p: score: 0 elementCount: 4 result: 0 regex: wordCount: 29 wordValue: 0.1 result: 1.75 | 0.85 | 1.4875 |
It is still weird to me that the user privacy affects a search b… | 3.02content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 55 wordValue: 0.1 result: 3.02 | 0.75 | 2.265 |
It'd be better to just test locally, sorry didn't have time to d… | 1.11content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 17 wordValue: 0.1 result: 1.11 | 0.65 | 0.7215 |
[ 9.565 UUSD ]
@Keyrxng
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Comment | 3 | 9.565 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
I've mentioned this before re: user privacy settings affecting o… | 4.71content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 93 wordValue: 0.1 result: 4.71 | 0.8 | 3.768 |
using the [explorer](https://docs.github.com/en/graphql/overview… | 5.77content: content: p: score: 0 elementCount: 1 a: score: 5 elementCount: 1 result: 5 regex: wordCount: 11 wordValue: 0.1 result: 0.77 | 0.9 | 5.693 |
Aye it likely would be sorrry bud | 0.52content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 7 wordValue: 0.1 result: 0.52 | 0.2 | 0.104 |
[ 30.46975 UUSD ]
@zugdev
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Review | Comment | 26 | 30.46975 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
I believe you should not remove this `.toLowerCase()` st… | 14.38content: content: p: score: 0 elementCount: 4 ol: score: 1 elementCount: 2 li: score: 0.5 elementCount: 2 img: score: 5 elementCount: 2 result: 13 regex: wordCount: 22 wordValue: 0.1 result: 1.38 | 0.9 | 3.5655 |
`this` isn't valid here because of scope, if you make fi… | 1.28content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28 | 0.8 | 0.256 |
Don't ignore `.husky` | 0.25content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 3 wordValue: 0.1 result: 0.25 | 0.3 | 0.01625 |
why this filter and second map? | 0.46content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 6 wordValue: 0.1 result: 0.46 | 0.6 | 0.074 |
would you mind writing a function in another file to abstract th… | 6.49content: content: p: score: 0 elementCount: 2 img: score: 5 elementCount: 1 result: 5 regex: wordCount: 24 wordValue: 0.1 result: 1.49 | 0.5 | 1.43375 |
Just add `issue.classList.add("active");` above `if … | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.7 | 0.11125 |
please rename this file and functions to: filter-issues-by-searc… | 1content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0.1 result: 1 | 0.4 | 0.1 |
In current prod search implementation. If you search for "UI", n… | 21.3content: content: p: score: 0 elementCount: 7 img: score: 5 elementCount: 3 result: 15 regex: wordCount: 131 wordValue: 0.1 result: 6.3 | 0.85 | 5.09375 |
TLDR: I like the weights idea and I like the relevance sorting. … | 3.38content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 63 wordValue: 0.1 result: 3.38 | 0.8 | 0.681 |
This looks great, please try removing the URLs before doing the … | 1.22content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22 | 0.75 | 0.23375 |
URL matching seems to work now!![image](https://github.com/use… | 14.32content: content: p: score: 0 elementCount: 3 img: score: 5 elementCount: 1 a: score: 5 elementCount: 1 result: 10 regex: wordCount: 84 wordValue: 0.1 result: 4.32 | 0.7 | 3.256 |
Interesting. Currently in prod, sorting just ignores what's in t… | 4.1content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 79 wordValue: 0.1 result: 4.1 | 0.6 | 0.62 |
That's great! Apparently it's very good, I'll review code in dep… | 1.11content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 17 wordValue: 0.1 result: 1.11 | 0.4 | 0.1135 |
That's all for me, the rest looks good. | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.3 | 0.04625 |
I have subbed to notifications in this PR, I'll approve once you… | 1.06content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 16 wordValue: 0.1 result: 1.06 | 0.2 | 0.058 |
It was pending, pardon. | 0.32content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 4 wordValue: 0.1 result: 0.32 | 0.1 | 0.008 |
@0x4007 let me know when you approve this merge. | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.2 | 0.03 |
Neither can I, @sshivaditya2019 this last commit broke it. | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.6 | 0.095 |
This does not fix it, it just doesn't reset sorting when you sea… | 11.9content: content: p: score: 0 elementCount: 4 img: score: 5 elementCount: 2 result: 10 regex: wordCount: 32 wordValue: 0.1 result: 1.9 | 0.8 | 2.885 |
ps: this is not the causeI did some quick looking and it might… | 4.45content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 87 wordValue: 0.1 result: 4.45 | 0.5 | 0.55375 |
@sshivaditya2019 [This might be the commit that broke it](https:… | 5.65content: content: p: score: 0 elementCount: 1 a: score: 5 elementCount: 1 result: 5 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.5 | 1.32875 |
[8395fd4](https://04e5dd6a.devpool-directory-ui.pages.dev/) this… | 11.9content: content: p: score: 0 elementCount: 2 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 32 wordValue: 0.1 result: 1.9 | 0.4 | 2.695 |
He is currently busy at a conference. I have tested this a lot, … | 6.07content: content: p: score: 0 elementCount: 5 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 3 result: 2.5 regex: wordCount: 67 wordValue: 0.1 result: 3.57 | 0.7 | 1.25225 |
The order is fixed, but you should skip loading animation if sea… | 4.8content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 95 wordValue: 0.1 result: 4.8 | 0.6 | 0.72 |
Hey, I know this has been a long process, sorry. But latest chan… | 21.38content: content: p: score: 0 elementCount: 6 ol: score: 1 elementCount: 2 li: score: 0.5 elementCount: 2 img: score: 5 elementCount: 2 a: score: 5 elementCount: 1 result: 18 regex: wordCount: 63 wordValue: 0.1 result: 3.38 | 0.7 | 5.0965 |
By QA this looks good, I'll simplify some code in another PR. Th… | 1.49content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 24 wordValue: 0.1 result: 1.49 | 0.4 | 0.1465 |
Perhaps we can improve our search experience by:
If performance is bad running all of these calculations, we can compile to wasm potentially.
The text was updated successfully, but these errors were encountered: