Client Side Embeddings Search #119

0x4007 · 2024-10-10T06:35:59Z

Perhaps we can improve our search experience by:

Loading in all the vector embeddings of every issue and associated comments from our database
Run the similarity search (i.e. cosine) to rank sort the most relevant issue

If performance is bad running all of these calculations, we can compile to wasm potentially.

0x4007 · 2024-10-10T06:36:48Z

@sshivaditya2019 rfc on time estimate and spec

shiv810 · 2024-10-23T13:14:38Z

This can be accomplished using natural library¹, which is highly optimized. The main challenge would be generating the 1024-size embeddings on the client side.

Rather than retrieving embeddings from the database, we could use the wink-js embedding model² to generate embeddings for both the query and the entries. These embeddings could be computed at load time, potentially increasing page load time by 15 to 35 seconds or more in some cases, and then used in the search process.

Vector-based search may not be particularly beneficial here; instead, heuristic-based retrieval methods, such as NDCG, along with a more effective search algorithm, would likely yield better results.

0x4007 · 2024-10-24T23:19:18Z

Lets do your recommendation

shiv810 · 2024-10-25T03:47:25Z

I think it will take around a day to set up the heuristic-based search functionality.

I'm not sure if there's a gating mechanism for tasks or something similar, but I can incorporate that into this task to create an integrated task recommender, if that's an requirement. @0x4007 rfc

0x4007 · 2024-10-25T21:09:33Z

I think it will take around a day to set up the heuristic-based search functionality.

I'm not sure if there's a gating mechanism for tasks or something similar,

This is not implemented anywhere now. However it will soon be implemented based on contributor/collaborator status and priority level (or time level)

But that will only be on GitHub and not our UI I think. We still need to figure that out.

but I can incorporate that into this task to create an integrated task recommender, if that's an requirement. @0x4007 rfc

Integrated task recommender sounds very cool on the UI level. I'm onboard with exploring this although as of right now implementation details are not clear to me.

shiv810 · 2024-10-26T16:14:44Z

/start

shiv810 · 2024-10-26T17:29:22Z

@0x4007 could you assign this issue to me ?

ubiquity-os · 2024-10-26T17:30:39Z

@sshivaditya2019 the deadline is at Sun, Oct 27, 5:30 PM UTC

0x4007 · 2024-10-26T17:30:57Z

/start

@gentlementlegen Not working again

Start officially is our most unreliable plugin

gentlementlegen · 2024-10-27T13:12:51Z

Error was

{ "message": "Validation Failed", "errors": [ { "message": "The listed users cannot be searched either because the users do not exist or you do not have permission to view the users.", "resource": "Search", "field": "q", "code": "invalid" } ], "documentation_url": "https://docs.github.com/v3/search/", "status": "422" }

with the search arguments like

{ "q": "org:ubiquity author:sshivaditya2019 state:open", "per_page": 100, "order": "desc", "sort": "created" }

URL for reference
https://api.github.com/search/issues?q=org%3Aubiquity%20author%3Asshivaditya2019%20state%3Aopen&per_page=100&order=desc&sort=created"

0x4007 · 2024-10-27T13:21:47Z

Okay you should figure the root problem and fix

Keyrxng · 2024-10-27T13:30:10Z

Okay you should figure the root problem and fix

I've mentioned this before re: user privacy settings affecting our attempts via GQL and rest but the root problem is shivs account' privacy settings being restricted which we don't control unfortunately.

So perhaps we should just assume defaults in this situation and apply the lowest contributor limits and then use an alt search query for PRs/Issue in the network and then filter using their username as they would be public as that's our org settings then. I assume it's the assigned issues query that's caused it here.

0x4007 · 2024-10-27T13:35:57Z

If it's something the contributor can fix then the solution is to write a detailed error explaining that they can't self assign until they fix their settings, explain exactly what to fix, and then provide a link to where they can fix.

gentlementlegen · 2024-10-28T06:13:37Z

It is still weird to me that the user privacy affects a search because the profile is public. Can we consider using GQL with issues search instead of the search API? Something like

query($organization: String!, $author: String!) {
  organization(login: $organization) {
    repositories(first: 100) {
      nodes {
        issues(first: 100, states: OPEN, filterBy: {createdBy: $author}) {
          nodes {
            title
            url
            createdAt
          }
        }
      }
    }
  }
}

with

{
  "organization": "ubiquity",
  "author": "sshivaditya2019"
}

would achieve the same result. I don't know if that would resolve the issue but it's worth a try.

0x4007 · 2024-10-28T14:02:44Z

You can test and verify pretty quickly. I suggest you do that and let us know.

Keyrxng · 2024-10-28T15:14:24Z

using the explorer and my login for access to the explorer

{
  "data": {
    "organization": null
  },
  "errors": [
    {
      "type": "FORBIDDEN",
      "path": [
        "organization",
        "repositories"
      ],
      "extensions": {
        "saml_failure": false
      },
      "locations": [
        {
          "line": 3,
          "column": 5
        }
      ],
      "message": "Although you appear to have the correct authorization credentials, the `ubiquity` organization has enabled OAuth App access restrictions, meaning that data access to third-parties is limited. For more information on these restrictions, including how to enable this app, visit https://docs.github.com/articles/restricting-access-to-your-organization-s-data/"

0x4007 · 2024-10-28T15:22:02Z

If OAuth app access is required to read user data, let's use the app for logging in on devpool.directory.

The error can explain that the user needs to sign in on devpool.directory if there is a problem reading their data.

gentlementlegen · 2024-10-28T15:23:41Z

It'd be better to just test locally, sorry didn't have time to do so today.

Keyrxng · 2024-10-28T15:29:44Z

It'd be better to just test locally, sorry didn't have time to do so today.

Aye it likely would be sorrry bud

ubiquity-os · 2024-11-15T16:08:39Z

[ 417.0465 UUSD ]
@sshivaditya2019

Contributions Overview

View	Contribution	Count	Reward
Issue	Task	1	400
Issue	Comment	3	17.0465
Review	Comment	17	0

Conversation Incentives

Comment	Formatting	Relevance	Reward
This can be accomplished using natural library[^01^], which is h…	15.48 content: content: p: score: 0 elementCount: 3 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 111 wordValue: 0.1 result: 5.48	0.85	14.658
I think it will take around a day to set up the heuristic-based …	2.87 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 52 wordValue: 0.1 result: 2.87	0.75	2.1525
@0x4007 could you assign this issue to me ?	0.59 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 8 wordValue: 0.1 result: 0.59	0.4	0.236
Resolves #119 - Adds a new search system, using a fuzzy matchi…	1.5 content: content: p: score: 0 elementCount: 2 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 1 result: 1.5 regex: wordCount: 22 wordValue: 0 result: 0	0.7	0
Fixed that.	0.36 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 2 wordValue: 0.2 result: 0.36	0.2	0
Isn't `filterIssues` an arrow function? I don't think it…	3.5 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0.2 result: 3.5	0.5	0
The IssueSearch already contains all of the issues; this only re…	4.5 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 39 wordValue: 0.2 result: 4.5	0.8	0
Moved the`filterIssues` to a separate file.	0.92 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 6 wordValue: 0.2 result: 0.92	0.6	0
A good example for testing would be "Large language models", ori…	5.84 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 53 wordValue: 0.2 result: 5.84	0.4	0
Fuzzy search only works with "?" prefixed searches; otherwise, h…	11.03 content: content: p: score: 0 elementCount: 6 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 result: 3 regex: wordCount: 77 wordValue: 0.2 result: 8.03	0.9	0
The results are not sorted by score, as that would conflict with…	3.91 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 33 wordValue: 0.2 result: 3.91	0.3	0
The results are sorted by default by their relevance score. If t…	4.6 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 40 wordValue: 0.2 result: 4.6	0.7	0
@0x4007 @zugdev Can you take a look at this pull? I think it's r…	3.4 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 28 wordValue: 0.2 result: 3.4	0.2	0
I’m not sure if you’ve left a review, but I don’t see any commen…	2.55 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.2 result: 2.55	0.1	0
That should be fixed. I'm not sure why this is happening, but I …	4.11 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 35 wordValue: 0.2 result: 4.11	0.3	0
I am not sure, if this is usual/related to this pull, but I am n…	4.01 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 34 wordValue: 0.2 result: 4.01	0.2	0
Are you sure it was this commit ? It doesn't seem like it made a…	4.11 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 35 wordValue: 0.2 result: 4.11	0.1	0
@0x4007 Is there anything still pending for this PR? I can make …	2.22 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 17 wordValue: 0.2 result: 2.22	0.2	0
It should be fixed in `94533de`. I changed the way the r…	1.77 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 13 wordValue: 0.2 result: 1.77	0.4	0
This has been fixed in `aae1a02`. I have rewritten the &…	2.98 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 24 wordValue: 0.2 result: 2.98	0.5	0

[ 43.408 UUSD ]
@0x4007

Contributions Overview

View	Contribution	Count	Reward
Issue	Specification	1	16.98
Issue	Comment	8	17.173
Review	Comment	3	9.255

Conversation Incentives

Comment	Formatting	Relevance	Reward
<img width="699" alt="image" src="https://github.com/user-att…	5.66 content: content: p: score: 0 elementCount: 5 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 69 wordValue: 0.1 result: 3.66	1	16.98
@sshivaditya2019 rfc on time estimate and spec	1.05 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 7 wordValue: 0.2 result: 1.05	0.7	0.735
Lets do your recommendation	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 4 wordValue: 0.2 result: 0.65	0.6	0.39
This is not implemented anywhere now. However it will soon be im…	7.49 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 71 wordValue: 0.2 result: 7.49	0.8	5.992
@gentlementlegen Not working againStart officially is our most …	1.54 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 11 wordValue: 0.2 result: 1.54	0.5	0.77
Okay you should figure the root problem and fix	1.29 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.2 result: 1.29	0.7	0.903
If it's something the contributor can fix then the solution is t…	4.99 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 44 wordValue: 0.2 result: 4.99	0.6	2.994
You can test and verify pretty quickly. I suggest you do that an…	2.11 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 16 wordValue: 0.2 result: 2.11	0.7	1.477
If OAuth app access is required to read user data, let's use the…	4.89 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 43 wordValue: 0.2 result: 4.89	0.8	3.912
What's some good search terms to test?I know before was exact s…	2.49 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 44 wordValue: 0.1 result: 2.49	0.7	1.743
I think I would feel more comfortable if we can enable fuzzy sea…	3.02 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 55 wordValue: 0.1 result: 3.02	0.8	2.416
![image](https://github.com/user-attachments/assets/1a63b3fd-044…	5.32 content: content: p: score: 0 elementCount: 1 img: score: 5 elementCount: 1 result: 5 regex: wordCount: 4 wordValue: 0.1 result: 0.32	0.3	5.096

[ 4.474 UUSD ]
@gentlementlegen

Contributions Overview

View	Contribution	Count	Reward
Issue	Comment	3	4.474

Conversation Incentives

Comment

Formatting

Relevance

Reward

Error was```json{ "message": "Validation Failed"…

1.75

content:
  content:
    p:
      score: 0
      elementCount: 4
  result: 0
regex:
  wordCount: 29
  wordValue: 0.1
  result: 1.75

0.85

1.4875

It is still weird to me that the user privacy affects a search b…

3.02

content:
  content:
    p:
      score: 0
      elementCount: 3
  result: 0
regex:
  wordCount: 55
  wordValue: 0.1
  result: 3.02

0.75

2.265

It'd be better to just test locally, sorry didn't have time to d…

1.11

content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 17
  wordValue: 0.1
  result: 1.11

0.65

0.7215

[ 9.565 UUSD ]
@Keyrxng

Contributions Overview

View	Contribution	Count	Reward
Issue	Comment	3	9.565

Conversation Incentives

Comment

Formatting

Relevance

Reward

I've mentioned this before re: user privacy settings affecting o…

4.71

content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 93
  wordValue: 0.1
  result: 4.71

0.8

3.768

using the [explorer](https://docs.github.com/en/graphql/overview…

5.77

content:
  content:
    p:
      score: 0
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 11
  wordValue: 0.1
  result: 0.77

0.9

5.693

Aye it likely would be sorrry bud

0.52

content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 7
  wordValue: 0.1
  result: 0.52

0.2

0.104

[ 30.46975 UUSD ]
@zugdev

Contributions Overview

View	Contribution	Count	Reward
Review	Comment	26	30.46975

Conversation Incentives

Comment	Formatting	Relevance	Reward
I believe you should not remove this `.toLowerCase()` st…	14.38 content: content: p: score: 0 elementCount: 4 ol: score: 1 elementCount: 2 li: score: 0.5 elementCount: 2 img: score: 5 elementCount: 2 result: 13 regex: wordCount: 22 wordValue: 0.1 result: 1.38	0.9	3.5655
`this` isn't valid here because of scope, if you make fi…	1.28 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28	0.8	0.256
Don't ignore `.husky`	0.25 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 3 wordValue: 0.1 result: 0.25	0.3	0.01625
why this filter and second map?	0.46 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 6 wordValue: 0.1 result: 0.46	0.6	0.074
would you mind writing a function in another file to abstract th…	6.49 content: content: p: score: 0 elementCount: 2 img: score: 5 elementCount: 1 result: 5 regex: wordCount: 24 wordValue: 0.1 result: 1.49	0.5	1.43375
Just add `issue.classList.add("active");` above `if …	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.7	0.11125
please rename this file and functions to: filter-issues-by-searc…	1 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0.1 result: 1	0.4	0.1
In current prod search implementation. If you search for "UI", n…	21.3 content: content: p: score: 0 elementCount: 7 img: score: 5 elementCount: 3 result: 15 regex: wordCount: 131 wordValue: 0.1 result: 6.3	0.85	5.09375
TLDR: I like the weights idea and I like the relevance sorting. …	3.38 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 63 wordValue: 0.1 result: 3.38	0.8	0.681
This looks great, please try removing the URLs before doing the …	1.22 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22	0.75	0.23375
URL matching seems to work now!![image](https://github.com/use…	14.32 content: content: p: score: 0 elementCount: 3 img: score: 5 elementCount: 1 a: score: 5 elementCount: 1 result: 10 regex: wordCount: 84 wordValue: 0.1 result: 4.32	0.7	3.256
Interesting. Currently in prod, sorting just ignores what's in t…	4.1 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 79 wordValue: 0.1 result: 4.1	0.6	0.62
That's great! Apparently it's very good, I'll review code in dep…	1.11 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 17 wordValue: 0.1 result: 1.11	0.4	0.1135
That's all for me, the rest looks good.	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.3	0.04625
I have subbed to notifications in this PR, I'll approve once you…	1.06 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 16 wordValue: 0.1 result: 1.06	0.2	0.058
It was pending, pardon.	0.32 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 4 wordValue: 0.1 result: 0.32	0.1	0.008
@0x4007 let me know when you approve this merge.	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.2	0.03
Neither can I, @sshivaditya2019 this last commit broke it.	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.6	0.095
This does not fix it, it just doesn't reset sorting when you sea…	11.9 content: content: p: score: 0 elementCount: 4 img: score: 5 elementCount: 2 result: 10 regex: wordCount: 32 wordValue: 0.1 result: 1.9	0.8	2.885
ps: this is not the causeI did some quick looking and it might…	4.45 content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 87 wordValue: 0.1 result: 4.45	0.5	0.55375
@sshivaditya2019 [This might be the commit that broke it](https:…	5.65 content: content: p: score: 0 elementCount: 1 a: score: 5 elementCount: 1 result: 5 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.5	1.32875
[8395fd4](https://04e5dd6a.devpool-directory-ui.pages.dev/) this…	11.9 content: content: p: score: 0 elementCount: 2 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 32 wordValue: 0.1 result: 1.9	0.4	2.695
He is currently busy at a conference. I have tested this a lot, …	6.07 content: content: p: score: 0 elementCount: 5 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 3 result: 2.5 regex: wordCount: 67 wordValue: 0.1 result: 3.57	0.7	1.25225
The order is fixed, but you should skip loading animation if sea…	4.8 content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 95 wordValue: 0.1 result: 4.8	0.6	0.72
Hey, I know this has been a long process, sorry. But latest chan…	21.38 content: content: p: score: 0 elementCount: 6 ol: score: 1 elementCount: 2 li: score: 0.5 elementCount: 2 img: score: 5 elementCount: 2 a: score: 5 elementCount: 1 result: 18 regex: wordCount: 63 wordValue: 0.1 result: 3.38	0.7	5.0965
By QA this looks good, I'll simplify some code in another PR. Th…	1.49 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 24 wordValue: 0.1 result: 1.49	0.4	0.1465

0x4007 added the Priority: 2 (Medium) label Oct 10, 2024

devpool-directory-superintendent bot mentioned this issue Oct 10, 2024

Client Side Embeddings Search ubiquity/devpool-directory#1663

Closed

0x4007 added the Time: <1 Day label Oct 25, 2024

ubiquity-os bot added the Price: 400 USD label Oct 25, 2024

0x4007 assigned shiv810 Oct 26, 2024

Keyrxng mentioned this issue Oct 27, 2024

Contributor Privacy Settings ubiquity-os-marketplace/command-start-stop#68

Closed

gentlementlegen mentioned this issue Oct 28, 2024

Change search queries to GraphQl to avoid permission issues ubiquity-os-marketplace/command-start-stop#70

Closed

shiv810 mentioned this issue Oct 31, 2024

Client Side Searching #149

Merged

zugdev closed this as completed in #149 Nov 15, 2024

Client Side Embeddings Search #119

Client Side Embeddings Search #119

Comments

0x4007 commented Oct 10, 2024

0x4007 commented Oct 10, 2024

shiv810 commented Oct 23, 2024 • edited Loading

Footnotes

0x4007 commented Oct 24, 2024

shiv810 commented Oct 25, 2024

0x4007 commented Oct 25, 2024

shiv810 commented Oct 26, 2024

shiv810 commented Oct 26, 2024

ubiquity-os bot commented Oct 26, 2024

0x4007 commented Oct 26, 2024 • edited Loading

gentlementlegen commented Oct 27, 2024

0x4007 commented Oct 27, 2024

Keyrxng commented Oct 27, 2024 • edited Loading

0x4007 commented Oct 27, 2024 • edited Loading

gentlementlegen commented Oct 28, 2024

0x4007 commented Oct 28, 2024

Keyrxng commented Oct 28, 2024 • edited Loading

0x4007 commented Oct 28, 2024 • edited Loading

gentlementlegen commented Oct 28, 2024

Keyrxng commented Oct 28, 2024

ubiquity-os bot commented Nov 15, 2024 • edited Loading

@sshivaditya2019

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

shiv810 commented Oct 23, 2024 •

edited

Loading

0x4007 commented Oct 26, 2024 •

edited

Loading

Keyrxng commented Oct 27, 2024 •

edited

Loading

0x4007 commented Oct 27, 2024 •

edited

Loading

Keyrxng commented Oct 28, 2024 •

edited

Loading

0x4007 commented Oct 28, 2024 •

edited

Loading

ubiquity-os bot commented Nov 15, 2024 •

edited

Loading