Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Side Searching #149

Merged
merged 19 commits into from
Nov 15, 2024
Merged

Client Side Searching #149

merged 19 commits into from
Nov 15, 2024

Conversation

shiv810
Copy link

@shiv810 shiv810 commented Oct 31, 2024

Resolves #119

  • Adds a new search system, using a fuzzy matching algorithm combined with Normalized Discounted
    Cumulative Gain (NDCG) for result ranking.

@shiv810 shiv810 requested a review from 0x4007 as a code owner October 31, 2024 16:52
@ubiquity-os-deployer
Copy link

ubiquity-os-deployer bot commented Oct 31, 2024

@0x4007
Copy link
Member

0x4007 commented Oct 31, 2024

What's some good search terms to test?

I know before was exact string matches.

I think it would be really useful to segment the search results like:

At the top would be exact matches

And then below a line would say similar results:

@shiv810
Copy link
Author

shiv810 commented Oct 31, 2024

What's some good search terms to test?

A good example for testing would be "Large language models", original implementation would not give results for that.

I know before was exact string matches.

I think it would be really useful to segment the search results like:

At the top would be exact matches

And then below a line would say similar results:

It's, weighted scoring so the exact match should have the highest score and be at the top most position, or should there be two sections one for exact matches and another for similar results.

@0x4007
Copy link
Member

0x4007 commented Oct 31, 2024

I think I would feel more comfortable if we can enable fuzzy search so we can compare results over time and tweak the parameters. Perhaps you can add a small checkbox next to the search bar?

Alternatively we could prefix the search with a symbol like ?

?large language model

And then it will fuzzy search

@zugdev
Copy link
Collaborator

zugdev commented Oct 31, 2024

In current prod search implementation. If you search for "UI", nothing changes because of UI in ubUIquity:

image

To me it's obvious the title should have a bigger weight, we should not consider body URLs and there should be sorting by relevance. You apparently are weighting title, body, etc... I think what could improve your solution greatly is removing URLs from issue's body when scoring. Strange stuff happens because of URL:

image

The UI bug is actually because of matching with URLs.

image

Scoring should eventually account for beginning of words too. But this seems like a good PR, just handling URLs should fix a lot of weird results. If you find correct weights and pre-processing, the scoring algo will be good enough so exact results will always be on top.

@zugdev
Copy link
Collaborator

zugdev commented Oct 31, 2024

TLDR: I like the weights idea and I like the relevance sorting. We need to clean issue's body to remove URLs, potentially other stuff too, such as code blocks. we could consider beginning of words as a stronger score in an exponential curve, so that if we have issues "banana" and "analphabet", searching "ban" leads to "banana" and "ana" leads to "analphabet".

@zugdev
Copy link
Collaborator

zugdev commented Oct 31, 2024

This looks great, please try removing the URLs before doing the exponential substring scoring, perhaps that'd be overkill.

@shiv810
Copy link
Author

shiv810 commented Nov 1, 2024

Fuzzy search only works with "?" prefixed searches; otherwise, heuristic search is used. URLs are not used for search content anymore. There's an exponential boost for word beginnings, so words matched with similar starting letters would score higher.

Some examples:

  • "ban" → Strongly matches "banner" and "banana" (word starts)
  • "ana" → Strongly matches "analysis" (word start), weakly matches "banana" (middle match)
  • "?ban" → Enables fuzzy search for "ban"-like words
  • "database schema" → Matches text while ignoring URLs in the content

@zugdev
Copy link
Collaborator

zugdev commented Nov 1, 2024

URL matching seems to work now!

image

Take a look at Collaborator Gating Based On Label, as can be seen in the image it's ranked second, though it has nothing to do with UI - probably because it has "Ubiquity" written two times inside a code block. I see you wrote code to ignore code blocks, might not be properly ignoring. I believe Update UBQ Farming UI should be placed higher too, perhaps increase title's weight even more. Though the micro adjusments, this looks very good: nice job.

@shiv810
Copy link
Author

shiv810 commented Nov 1, 2024

The results are not sorted by score, as that would conflict with the sorting-manager. So, the second entry has a relevance weight of 0.07, while the third has a score of 0.67.

@zugdev
Copy link
Collaborator

zugdev commented Nov 1, 2024

Interesting. Currently in prod, sorting just ignores what's in the text field, so if you search and sort it will show not only the filtered by search ones, but all existent issues anyway. Therefore it wouldn't really conflict, but we should make it clear by clearing the text field when user presses a sorting button. Another option is we let it be and never sort by relevance, allowing sorting to act upon search-filtered issues. @0x4007 RFC

@shiv810
Copy link
Author

shiv810 commented Nov 1, 2024

The results are sorted by default by their relevance score. If the you switch to a sorting method, the text box would be cleared. Similarly, if you perform a search while using a sorting method, the sort button will reset.

@zugdev
Copy link
Collaborator

zugdev commented Nov 1, 2024

That's great! Apparently it's very good, I'll review code in depth and circle back.

Copy link
Contributor

ubiquity-os bot commented Nov 4, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

@shiv810
Copy link
Author

shiv810 commented Nov 5, 2024

@0x4007 @zugdev Can you take a look at this pull? I think it's ready for review, but I can't request a review here for some reason.

@zugdev
Copy link
Collaborator

zugdev commented Nov 5, 2024

That's all for me, the rest looks good.

@zugdev
Copy link
Collaborator

zugdev commented Nov 6, 2024

I have subbed to notifications in this PR, I'll approve once you respond the review.

@shiv810
Copy link
Author

shiv810 commented Nov 6, 2024

I have subbed to notifications in this PR, I'll approve once you respond the review.

I’m not sure if you’ve left a review, but I don’t see any comments on the PR.

@zugdev zugdev self-requested a review November 6, 2024 03:54
@zugdev
Copy link
Collaborator

zugdev commented Nov 6, 2024

I’m not sure if you’ve left a review, but I don’t see any comments on the PR.

It was pending, pardon.

@zugdev
Copy link
Collaborator

zugdev commented Nov 6, 2024

@0x4007 let me know when you approve this merge.

@0x4007
Copy link
Member

0x4007 commented Nov 6, 2024

image

Can't type anything

@zugdev
Copy link
Collaborator

zugdev commented Nov 6, 2024

Neither can I, @sshivaditya2019 this last commit broke it.

@shiv810
Copy link
Author

shiv810 commented Nov 6, 2024

That should be fixed. I'm not sure why this is happening, but I was able to replicate the issue only on the Cloudflare builds; the local versions worked fine on the same browser(mobile).

@zugdev
Copy link
Collaborator

zugdev commented Nov 7, 2024

This does not fix it, it just doesn't reset sorting when you search because the if fails. The error is in a _resetSortButtons call

image

so investigate where you call this function. QA:

image

@zugdev
Copy link
Collaborator

zugdev commented Nov 7, 2024

ps: this is not the cause

I did some quick looking and it might be because filterIssues runs on load, given the observer:

// Observer to detect when children are added to the issues container (only once)
const observer = new MutationObserver(() => {
if (issuesContainer.children.length > 0) {
observer.disconnect(); // Stop observing once children are present
if (searchQuery) filterIssues(); // Filter on load if search query exists
}
});
observer.observe(issuesContainer, { childList: true });

And the SortingManager only is created on:

export function generateSortingToolbar() {
const sortingManagerTop = new SortingManager("filters", SORTING_OPTIONS, "top");
sortingManagerTop.render();
const sortingManagerBottom = new SortingManager("filters-bottom", SORTING_OPTIONS, "bottom");
sortingManagerBottom.render();
}

Even though the function above runs in line 20 of home.ts on cloud it might run after filterIssues which implies undefined elements to reset.

@zugdev
Copy link
Collaborator

zugdev commented Nov 7, 2024

@sshivaditya2019 This might be the commit that broke it

@shiv810
Copy link
Author

shiv810 commented Nov 12, 2024

It should be fixed in 94533de. I changed the way the results were sorted.

@zugdev
Copy link
Collaborator

zugdev commented Nov 12, 2024

The order is fixed, but you should skip loading animation if search was derived from the URL as currently in prod:

Open in prod, it's smooth: https://devpool.directory/?search=aa

Screen.Recording.2024-11-12.173611.mp4

In 94533de it lags: https://ba848e44.devpool-directory-ui.pages.dev/?search=pool

Screen.Recording.2024-11-12.173730.mp4

This is because the animation is taking into account the non-displayed in between issues. I will pinpoint where you need to modify in code review in a sec.

@zugdev
Copy link
Collaborator

zugdev commented Nov 14, 2024

Hey, I know this has been a long process, sorry. But latest changes introduced a "unsorting" problem:

  1. Initial order is not restored when unsorted:

Initial state:
image

State after clicking to sort by time until unsorting:
image

  1. If you sort and then search, even though the sort button is visually reset, the sort is still applied
Screencast.from.14-11-2024.17.37.10.webm

@shiv810
Copy link
Author

shiv810 commented Nov 15, 2024

Hey, I know this has been a long process, sorry. But latest changes introduced a "unsorting" problem:

This has been fixed in aae1a02. I have rewritten the filterIssues to make it more consistent with how issues are usually rendered for other sorting methods.

@zugdev
Copy link
Collaborator

zugdev commented Nov 15, 2024

By QA this looks good, I'll simplify some code in another PR. Thank you, for your responsiveness, it was great working with you.

@zugdev zugdev merged commit d0c30b6 into ubiquity:development Nov 15, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client Side Embeddings Search
4 participants