-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client Side Searching #149
Conversation
What's some good search terms to test? I know before was exact string matches. I think it would be really useful to segment the search results like: At the top would be exact matches And then below a line would say similar results: |
A good example for testing would be "Large language models", original implementation would not give results for that.
It's, weighted scoring so the exact match should have the highest score and be at the top most position, or should there be two sections one for exact matches and another for similar results. |
I think I would feel more comfortable if we can enable fuzzy search so we can compare results over time and tweak the parameters. Perhaps you can add a small checkbox next to the search bar? Alternatively we could prefix the search with a symbol like ? ?large language model And then it will fuzzy search |
In current prod search implementation. If you search for "UI", nothing changes because of UI in ubUIquity: To me it's obvious the title should have a bigger weight, we should not consider body URLs and there should be sorting by relevance. You apparently are weighting title, body, etc... I think what could improve your solution greatly is removing URLs from issue's body when scoring. Strange stuff happens because of URL: The UI bug is actually because of matching with URLs. Scoring should eventually account for beginning of words too. But this seems like a good PR, just handling URLs should fix a lot of weird results. If you find correct weights and pre-processing, the scoring algo will be good enough so exact results will always be on top. |
TLDR: I like the weights idea and I like the relevance sorting. We need to clean issue's body to remove URLs, potentially other stuff too, such as code blocks. we could consider beginning of words as a stronger score in an exponential curve, so that if we have issues "banana" and "analphabet", searching "ban" leads to "banana" and "ana" leads to "analphabet". |
This looks great, please try removing the URLs before doing the exponential substring scoring, perhaps that'd be overkill. |
Fuzzy search only works with "?" prefixed searches; otherwise, heuristic search is used. URLs are not used for search content anymore. There's an exponential boost for word beginnings, so words matched with similar starting letters would score higher. Some examples:
|
URL matching seems to work now! Take a look at Collaborator Gating Based On Label, as can be seen in the image it's ranked second, though it has nothing to do with UI - probably because it has "Ubiquity" written two times inside a code block. I see you wrote code to ignore code blocks, might not be properly ignoring. I believe Update UBQ Farming UI should be placed higher too, perhaps increase title's weight even more. Though the micro adjusments, this looks very good: nice job. |
The results are not sorted by score, as that would conflict with the |
Interesting. Currently in prod, sorting just ignores what's in the text field, so if you search and sort it will show not only the filtered by search ones, but all existent issues anyway. Therefore it wouldn't really conflict, but we should make it clear by clearing the text field when user presses a sorting button. Another option is we let it be and never sort by relevance, allowing sorting to act upon search-filtered issues. @0x4007 RFC |
The results are sorted by default by their relevance score. If the you switch to a sorting method, the text box would be cleared. Similarly, if you perform a search while using a sorting method, the sort button will reset. |
That's great! Apparently it's very good, I'll review code in depth and circle back. |
@sshivaditya2019, this task has been idle for a while. Please provide an update. |
That's all for me, the rest looks good. |
I have subbed to notifications in this PR, I'll approve once you respond the review. |
I’m not sure if you’ve left a review, but I don’t see any comments on the PR. |
It was pending, pardon. |
@0x4007 let me know when you approve this merge. |
Neither can I, @sshivaditya2019 this last commit broke it. |
That should be fixed. I'm not sure why this is happening, but I was able to replicate the issue only on the Cloudflare builds; the local versions worked fine on the same browser(mobile). |
ps: this is not the cause I did some quick looking and it might be because work.ubq.fi/src/home/sorting/sorting-manager.ts Lines 74 to 81 in bd41261
And the SortingManager only is created on: work.ubq.fi/src/home/sorting/generate-sorting-buttons.ts Lines 6 to 12 in bd41261
Even though the function above runs in line 20 of |
@sshivaditya2019 This might be the commit that broke it |
with ?, removed links from search consideration and added exp scoring
It should be fixed in |
The order is fixed, but you should skip loading animation if search was derived from the URL as currently in prod: Open in prod, it's smooth: https://devpool.directory/?search=aa Screen.Recording.2024-11-12.173611.mp4In Screen.Recording.2024-11-12.173730.mp4This is because the animation is taking into account the non-displayed in between issues. I will pinpoint where you need to modify in code review in a sec. |
This has been fixed in |
By QA this looks good, I'll simplify some code in another PR. Thank you, for your responsiveness, it was great working with you. |
Resolves #119
Cumulative Gain (NDCG) for result ranking.