Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Side Embeddings Search #119

Closed
0x4007 opened this issue Oct 10, 2024 · 20 comments · Fixed by #149
Closed

Client Side Embeddings Search #119

0x4007 opened this issue Oct 10, 2024 · 20 comments · Fixed by #149

Comments

@0x4007
Copy link
Member

0x4007 commented Oct 10, 2024

image

Perhaps we can improve our search experience by:

  1. Loading in all the vector embeddings of every issue and associated comments from our database
  2. Run the similarity search (i.e. cosine) to rank sort the most relevant issue

If performance is bad running all of these calculations, we can compile to wasm potentially.

@0x4007
Copy link
Member Author

0x4007 commented Oct 10, 2024

@sshivaditya2019 rfc on time estimate and spec

@shiv810
Copy link

shiv810 commented Oct 23, 2024

This can be accomplished using natural library1, which is highly optimized. The main challenge would be generating the 1024-size embeddings on the client side.

Rather than retrieving embeddings from the database, we could use the wink-js embedding model2 to generate embeddings for both the query and the entries. These embeddings could be computed at load time, potentially increasing page load time by 15 to 35 seconds or more in some cases, and then used in the search process.

Vector-based search may not be particularly beneficial here; instead, heuristic-based retrieval methods, such as NDCG, along with a more effective search algorithm, would likely yield better results.

Footnotes

  1. https://github.com/NaturalNode/natural

  2. https://winkjs.org/

@0x4007
Copy link
Member Author

0x4007 commented Oct 24, 2024

Lets do your recommendation

@shiv810
Copy link

shiv810 commented Oct 25, 2024

I think it will take around a day to set up the heuristic-based search functionality.

I'm not sure if there's a gating mechanism for tasks or something similar, but I can incorporate that into this task to create an integrated task recommender, if that's an requirement. @0x4007 rfc

@0x4007
Copy link
Member Author

0x4007 commented Oct 25, 2024

I think it will take around a day to set up the heuristic-based search functionality.

I'm not sure if there's a gating mechanism for tasks or something similar,

This is not implemented anywhere now. However it will soon be implemented based on contributor/collaborator status and priority level (or time level)

But that will only be on GitHub and not our UI I think. We still need to figure that out.

but I can incorporate that into this task to create an integrated task recommender, if that's an requirement. @0x4007 rfc

Integrated task recommender sounds very cool on the UI level. I'm onboard with exploring this although as of right now implementation details are not clear to me.

@shiv810
Copy link

shiv810 commented Oct 26, 2024

/start

@shiv810
Copy link

shiv810 commented Oct 26, 2024

@0x4007 could you assign this issue to me ?

Copy link
Contributor

ubiquity-os bot commented Oct 26, 2024

@sshivaditya2019 the deadline is at Sun, Oct 27, 5:30 PM UTC

@0x4007
Copy link
Member Author

0x4007 commented Oct 26, 2024

/start

@gentlementlegen Not working again

Start officially is our most unreliable plugin

@gentlementlegen
Copy link
Member

Error was

{ "message": "Validation Failed", "errors": [ { "message": "The listed users cannot be searched either because the users do not exist or you do not have permission to view the users.", "resource": "Search", "field": "q", "code": "invalid" } ], "documentation_url": "https://docs.github.com/v3/search/", "status": "422" }

with the search arguments like

{ "q": "org:ubiquity author:sshivaditya2019 state:open", "per_page": 100, "order": "desc", "sort": "created" }

URL for reference
https://api.github.com/search/issues?q=org%3Aubiquity%20author%3Asshivaditya2019%20state%3Aopen&per_page=100&order=desc&sort=created"

@0x4007
Copy link
Member Author

0x4007 commented Oct 27, 2024

Okay you should figure the root problem and fix

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 27, 2024

Okay you should figure the root problem and fix

I've mentioned this before re: user privacy settings affecting our attempts via GQL and rest but the root problem is shivs account' privacy settings being restricted which we don't control unfortunately.

So perhaps we should just assume defaults in this situation and apply the lowest contributor limits and then use an alt search query for PRs/Issue in the network and then filter using their username as they would be public as that's our org settings then. I assume it's the assigned issues query that's caused it here.

@0x4007
Copy link
Member Author

0x4007 commented Oct 27, 2024

If it's something the contributor can fix then the solution is to write a detailed error explaining that they can't self assign until they fix their settings, explain exactly what to fix, and then provide a link to where they can fix.

@gentlementlegen
Copy link
Member

It is still weird to me that the user privacy affects a search because the profile is public. Can we consider using GQL with issues search instead of the search API? Something like

query($organization: String!, $author: String!) {
  organization(login: $organization) {
    repositories(first: 100) {
      nodes {
        issues(first: 100, states: OPEN, filterBy: {createdBy: $author}) {
          nodes {
            title
            url
            createdAt
          }
        }
      }
    }
  }
}

with

{
  "organization": "ubiquity",
  "author": "sshivaditya2019"
}

would achieve the same result. I don't know if that would resolve the issue but it's worth a try.

@0x4007
Copy link
Member Author

0x4007 commented Oct 28, 2024

You can test and verify pretty quickly. I suggest you do that and let us know.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 28, 2024

using the explorer and my login for access to the explorer

{
  "data": {
    "organization": null
  },
  "errors": [
    {
      "type": "FORBIDDEN",
      "path": [
        "organization",
        "repositories"
      ],
      "extensions": {
        "saml_failure": false
      },
      "locations": [
        {
          "line": 3,
          "column": 5
        }
      ],
      "message": "Although you appear to have the correct authorization credentials, the `ubiquity` organization has enabled OAuth App access restrictions, meaning that data access to third-parties is limited. For more information on these restrictions, including how to enable this app, visit https://docs.github.com/articles/restricting-access-to-your-organization-s-data/"

@0x4007
Copy link
Member Author

0x4007 commented Oct 28, 2024

If OAuth app access is required to read user data, let's use the app for logging in on devpool.directory.

The error can explain that the user needs to sign in on devpool.directory if there is a problem reading their data.

@gentlementlegen
Copy link
Member

It'd be better to just test locally, sorry didn't have time to do so today.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 28, 2024

It'd be better to just test locally, sorry didn't have time to do so today.

Aye it likely would be sorrry bud

Copy link
Contributor

ubiquity-os bot commented Nov 15, 2024

 [ 417.0465 UUSD ] 

@sshivaditya2019
Contributions Overview
ViewContributionCountReward
IssueTask1400
IssueComment317.0465
ReviewComment170
Conversation Incentives
CommentFormattingRelevanceReward
This can be accomplished using natural library[^01^], which is h…
15.48
content:
  content:
    p:
      score: 0
      elementCount: 3
    a:
      score: 5
      elementCount: 2
  result: 10
regex:
  wordCount: 111
  wordValue: 0.1
  result: 5.48
0.8514.658
I think it will take around a day to set up the heuristic-based …
2.87
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 52
  wordValue: 0.1
  result: 2.87
0.752.1525
@0x4007 could you assign this issue to me ?
0.59
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 8
  wordValue: 0.1
  result: 0.59
0.40.236
Resolves #119 - Adds a new search system, using a fuzzy matchi…
1.5
content:
  content:
    p:
      score: 0
      elementCount: 2
    ul:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 1
  result: 1.5
regex:
  wordCount: 22
  wordValue: 0
  result: 0
0.70
Fixed that.
0.36
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 2
  wordValue: 0.2
  result: 0.36
0.20
Isn't `filterIssues` an arrow function? I don't think it…
3.5
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 29
  wordValue: 0.2
  result: 3.5
0.50
The IssueSearch already contains all of the issues; this only re…
4.5
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 39
  wordValue: 0.2
  result: 4.5
0.80
Moved the`filterIssues` to a separate file.
0.92
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 6
  wordValue: 0.2
  result: 0.92
0.60
A good example for testing would be "Large language models", ori…
5.84
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 53
  wordValue: 0.2
  result: 5.84
0.40
Fuzzy search only works with "?" prefixed searches; otherwise, h…
11.03
content:
  content:
    p:
      score: 0
      elementCount: 6
    ul:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 4
  result: 3
regex:
  wordCount: 77
  wordValue: 0.2
  result: 8.03
0.90
The results are not sorted by score, as that would conflict with…
3.91
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 33
  wordValue: 0.2
  result: 3.91
0.30
The results are sorted by default by their relevance score. If t…
4.6
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 40
  wordValue: 0.2
  result: 4.6
0.70
@0x4007 @zugdev Can you take a look at this pull? I think it's r…
3.4
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 28
  wordValue: 0.2
  result: 3.4
0.20
I’m not sure if you’ve left a review, but I don’t see any commen…
2.55
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 20
  wordValue: 0.2
  result: 2.55
0.10
That should be fixed. I'm not sure why this is happening, but I …
4.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 35
  wordValue: 0.2
  result: 4.11
0.30
I am not sure, if this is usual/related to this pull, but I am n…
4.01
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 34
  wordValue: 0.2
  result: 4.01
0.20
Are you sure it was this commit ? It doesn't seem like it made a…
4.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 35
  wordValue: 0.2
  result: 4.11
0.10
@0x4007 Is there anything still pending for this PR? I can make …
2.22
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 17
  wordValue: 0.2
  result: 2.22
0.20
It should be fixed in `94533de`. I changed the way the r…
1.77
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 13
  wordValue: 0.2
  result: 1.77
0.40
This has been fixed in `aae1a02`. I have rewritten the &…
2.98
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 24
  wordValue: 0.2
  result: 2.98
0.50

 [ 43.408 UUSD ] 

@0x4007
Contributions Overview
ViewContributionCountReward
IssueSpecification116.98
IssueComment817.173
ReviewComment39.255
Conversation Incentives
CommentFormattingRelevanceReward
<img width="699" alt="image" src="https://github.com/user-att…
5.66
content:
  content:
    p:
      score: 0
      elementCount: 5
    ol:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 2
  result: 2
regex:
  wordCount: 69
  wordValue: 0.1
  result: 3.66
116.98
@sshivaditya2019 rfc on time estimate and spec
1.05
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 7
  wordValue: 0.2
  result: 1.05
0.70.735
Lets do your recommendation
0.65
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 4
  wordValue: 0.2
  result: 0.65
0.60.39
This is not implemented anywhere now. However it will soon be im…
7.49
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 71
  wordValue: 0.2
  result: 7.49
0.85.992
@gentlementlegen Not working againStart officially is our most …
1.54
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 11
  wordValue: 0.2
  result: 1.54
0.50.77
Okay you should figure the root problem and fix
1.29
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 9
  wordValue: 0.2
  result: 1.29
0.70.903
If it's something the contributor can fix then the solution is t…
4.99
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 44
  wordValue: 0.2
  result: 4.99
0.62.994
You can test and verify pretty quickly. I suggest you do that an…
2.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.2
  result: 2.11
0.71.477
If OAuth app access is required to read user data, let's use the…
4.89
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 43
  wordValue: 0.2
  result: 4.89
0.83.912
What's some good search terms to test?I know before was exact s…
2.49
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 44
  wordValue: 0.1
  result: 2.49
0.71.743
I think I would feel more comfortable if we can enable fuzzy sea…
3.02
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 55
  wordValue: 0.1
  result: 3.02
0.82.416
![image](https://github.com/user-attachments/assets/1a63b3fd-044…
5.32
content:
  content:
    p:
      score: 0
      elementCount: 1
    img:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 4
  wordValue: 0.1
  result: 0.32
0.35.096

 [ 4.474 UUSD ] 

@gentlementlegen
Contributions Overview
ViewContributionCountReward
IssueComment34.474
Conversation Incentives
CommentFormattingRelevanceReward
Error was```json{ "message": "Validation Failed"…
1.75
content:
  content:
    p:
      score: 0
      elementCount: 4
  result: 0
regex:
  wordCount: 29
  wordValue: 0.1
  result: 1.75
0.851.4875
It is still weird to me that the user privacy affects a search b…
3.02
content:
  content:
    p:
      score: 0
      elementCount: 3
  result: 0
regex:
  wordCount: 55
  wordValue: 0.1
  result: 3.02
0.752.265
It'd be better to just test locally, sorry didn't have time to d…
1.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 17
  wordValue: 0.1
  result: 1.11
0.650.7215

 [ 9.565 UUSD ] 

@Keyrxng
Contributions Overview
ViewContributionCountReward
IssueComment39.565
Conversation Incentives
CommentFormattingRelevanceReward
I've mentioned this before re: user privacy settings affecting o…
4.71
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 93
  wordValue: 0.1
  result: 4.71
0.83.768
using the [explorer](https://docs.github.com/en/graphql/overview…
5.77
content:
  content:
    p:
      score: 0
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 11
  wordValue: 0.1
  result: 0.77
0.95.693
Aye it likely would be sorrry bud
0.52
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 7
  wordValue: 0.1
  result: 0.52
0.20.104

 [ 30.46975 UUSD ] 

@zugdev
Contributions Overview
ViewContributionCountReward
ReviewComment2630.46975
Conversation Incentives
CommentFormattingRelevanceReward
I believe you should not remove this `.toLowerCase()` st…
14.38
content:
  content:
    p:
      score: 0
      elementCount: 4
    ol:
      score: 1
      elementCount: 2
    li:
      score: 0.5
      elementCount: 2
    img:
      score: 5
      elementCount: 2
  result: 13
regex:
  wordCount: 22
  wordValue: 0.1
  result: 1.38
0.93.5655
`this` isn't valid here because of scope, if you make fi…
1.28
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 20
  wordValue: 0.1
  result: 1.28
0.80.256
Don't ignore `.husky`
0.25
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 3
  wordValue: 0.1
  result: 0.25
0.30.01625
why this filter and second map?
0.46
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 6
  wordValue: 0.1
  result: 0.46
0.60.074
would you mind writing a function in another file to abstract th…
6.49
content:
  content:
    p:
      score: 0
      elementCount: 2
    img:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 24
  wordValue: 0.1
  result: 1.49
0.51.43375
Just add `issue.classList.add("active");` above `if …
0.65
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 9
  wordValue: 0.1
  result: 0.65
0.70.11125
please rename this file and functions to: filter-issues-by-searc…
1
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 15
  wordValue: 0.1
  result: 1
0.40.1
In current prod search implementation. If you search for "UI", n…
21.3
content:
  content:
    p:
      score: 0
      elementCount: 7
    img:
      score: 5
      elementCount: 3
  result: 15
regex:
  wordCount: 131
  wordValue: 0.1
  result: 6.3
0.855.09375
TLDR: I like the weights idea and I like the relevance sorting. …
3.38
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 63
  wordValue: 0.1
  result: 3.38
0.80.681
This looks great, please try removing the URLs before doing the …
1.22
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 19
  wordValue: 0.1
  result: 1.22
0.750.23375
URL matching seems to work now!![image](https://github.com/use…
14.32
content:
  content:
    p:
      score: 0
      elementCount: 3
    img:
      score: 5
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 10
regex:
  wordCount: 84
  wordValue: 0.1
  result: 4.32
0.73.256
Interesting. Currently in prod, sorting just ignores what's in t…
4.1
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 79
  wordValue: 0.1
  result: 4.1
0.60.62
That's great! Apparently it's very good, I'll review code in dep…
1.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 17
  wordValue: 0.1
  result: 1.11
0.40.1135
That's all for me, the rest looks good.
0.65
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 9
  wordValue: 0.1
  result: 0.65
0.30.04625
I have subbed to notifications in this PR, I'll approve once you…
1.06
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.1
  result: 1.06
0.20.058
It was pending, pardon.
0.32
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 4
  wordValue: 0.1
  result: 0.32
0.10.008
@0x4007 let me know when you approve this merge.
0.65
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 9
  wordValue: 0.1
  result: 0.65
0.20.03
Neither can I, @sshivaditya2019 this last commit broke it.
0.65
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 9
  wordValue: 0.1
  result: 0.65
0.60.095
This does not fix it, it just doesn't reset sorting when you sea…
11.9
content:
  content:
    p:
      score: 0
      elementCount: 4
    img:
      score: 5
      elementCount: 2
  result: 10
regex:
  wordCount: 32
  wordValue: 0.1
  result: 1.9
0.82.885
ps: this is not the causeI did some quick looking and it might…
4.45
content:
  content:
    p:
      score: 0
      elementCount: 6
  result: 0
regex:
  wordCount: 87
  wordValue: 0.1
  result: 4.45
0.50.55375
@sshivaditya2019 [This might be the commit that broke it](https:…
5.65
content:
  content:
    p:
      score: 0
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 9
  wordValue: 0.1
  result: 0.65
0.51.32875
[8395fd4](https://04e5dd6a.devpool-directory-ui.pages.dev/) this…
11.9
content:
  content:
    p:
      score: 0
      elementCount: 2
    a:
      score: 5
      elementCount: 2
  result: 10
regex:
  wordCount: 32
  wordValue: 0.1
  result: 1.9
0.42.695
He is currently busy at a conference. I have tested this a lot, …
6.07
content:
  content:
    p:
      score: 0
      elementCount: 5
    ol:
      score: 1
      elementCount: 1
    li:
      score: 0.5
      elementCount: 3
  result: 2.5
regex:
  wordCount: 67
  wordValue: 0.1
  result: 3.57
0.71.25225
The order is fixed, but you should skip loading animation if sea…
4.8
content:
  content:
    p:
      score: 0
      elementCount: 6
  result: 0
regex:
  wordCount: 95
  wordValue: 0.1
  result: 4.8
0.60.72
Hey, I know this has been a long process, sorry. But latest chan…
21.38
content:
  content:
    p:
      score: 0
      elementCount: 6
    ol:
      score: 1
      elementCount: 2
    li:
      score: 0.5
      elementCount: 2
    img:
      score: 5
      elementCount: 2
    a:
      score: 5
      elementCount: 1
  result: 18
regex:
  wordCount: 63
  wordValue: 0.1
  result: 3.38
0.75.0965
By QA this looks good, I'll simplify some code in another PR. Th…
1.49
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 24
  wordValue: 0.1
  result: 1.49
0.40.1465

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants