-
Notifications
You must be signed in to change notification settings - Fork 63
Rank for results #16
Comments
Hi @shameelabdulla, thanks for the suggestion. I think the easiest way to do this is after the fact, on the (limited) list of results. You could use an implementation of Levenshtein to calculate the similarity of the result strings and your input string. If you do, I'd welcome an add-on to Fuzzily that does this! |
Oh great!! I ll add it Sent from my iPhone On 22-Sep-2013, at 2:06 AM, Julien Letessier notifications@github.com wrote:
|
This works for me: in def _matches_for_trigrams(trigrams)
self.
select('owner_id, owner_type, count(*) AS matches, MAX(score) AS score').
group('owner_id, owner_type').
order('matches DESC, score ASC').
with_trigram(trigrams).
- map(&:owner)
+ map do |t|
+ t.owner.tap |o|
+ o.instance_eval "def fuzzily_score; #{t.score}; end"
+ end
+ end
end Would you like a patch for this? |
It would be great if you can add as patch On Mon, Sep 30, 2013 at 3:10 PM, Andy Stewart notifications@github.comwrote:
|
Here's some code: airblade@8b41888. It's not as clean as my diff above due to having to work around the problem in #18. |
Hi Andy, On Mon, Sep 30, 2013 at 6:30 PM, Andy Stewart notifications@github.comwrote:
|
I had assumed that |
Looking at the code, a trigram's score is simply the length of the word from which it came. The fuzzy finder orders its results by We need a way to normalise the quality of the matches (so they're comparable across models). How about modifying my code above like this: o.instance_eval "def fuzzily_score; #{t.matches / t.score.to_f}; end" – although that doesn't really normalise the results to between 0 and 1. |
For normalising the score, how about:
Let's say the search text is
|
Here's another scoring method which I quite like. It keeps the same order in which the results are returned, i.e. the The more matches the better, and the lower the score the better.
|
Trying one by one with a data set. Will let you know Sent from my iPad
|
What is the diff between result score and search text score? On Tue, Oct 1, 2013 at 7:31 PM, Andy Stewart notifications@github.comwrote:
|
The score is simply the length of the string. The result score is the length of the result string, and the search text score is the length of the text we're searching for. |
Does not seem to work for the data I ve. I ll tell you the problem I am As Input (lets call it insane data :) )I ve names of products - But the What I am trying to do is
However for step 3 I have not yet been able to figure out a proper scoring The following is the data set I tried: Insane name entry: Response from fuzzy with scores: For my requirement 5th entry from last should ve the highest score. Any On Wed, Oct 2, 2013 at 12:47 PM, Andy Stewart notifications@github.comwrote:
|
@airblade — while having a normalized "matchiness" metric is a hard problem, it looks like your formula works. The reason the "score" is the length of the needle is that you want "York" to rank before "New York" when searching for "York"—between two strings that match as well in terms of number of matching trigrams, you want the shortest one, which will be the "best" match. Implementation-wise, defining extra methods on the fly is a performance killer (it flushes the method cache, so affects an entire application), and it's probably not something you want to use the database to do either. I haven't had much time last week but I'll try to cobble something together this weekend. |
@ariblade Tried a combination score suggested by you [(x.matches / Seems to work. Analysing with results. On Wed, Oct 2, 2013 at 1:41 PM, Julien Letessier
|
As a (late) update to this, I can't use the code directly as it has no test and also has a performance issue—it adds methods on the fly, which kills the method cache in Ruby < 2.1. Working on an alternate solution based on @airblade's formula. |
I ve been checking out fuzzily gem it greatly helps. It would be great if there is a rank for suggestions returned. I know that the best suggestion is the first result. If there is a way to give point for each suggestion (say 0 => Exact match, 0.2 => deviates to some extend, 0.9 => deviates to a great extend), it would be really great.
The text was updated successfully, but these errors were encountered: