Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced search #423

Open
sirtoobii opened this issue Jun 15, 2022 · 5 comments
Open

Advanced search #423

sirtoobii opened this issue Jun 15, 2022 · 5 comments

Comments

@sirtoobii
Copy link

sirtoobii commented Jun 15, 2022

Currently search terms are split by space, all "regex"-sequences are escaped by Regexp.escape() and then concatenated using | which prevents any complex searches:

search_terms = query.scan(/"([^"]+)"|(\S+)/).flatten.compact.map {|term| Regexp.escape(term)}
search_terms_regex = search_terms.join('|')

Is there any particular reason to allow only OR "advanced" searches?

Gollum Version: 5.3.0 on Ubuntu 20.04

@sirtoobii sirtoobii changed the title Search: Advanced search Advanced search Jun 15, 2022
@dometto
Copy link
Member

dometto commented Jun 15, 2022

The reason is just complexity: since we don't want the user to have to write "foo|bar" for a disjunctive search term, we can't just treat the user-inputted search string as a regex. If we want to allow more advanced searches, we would therefore have to perform more varied transformations from the user-inputted string to a valid regex. There may be libraries that do this, i.e. construct regexes out of intuitively formatted user-inputted search strings: if anyone knows any and wants to try to hook it into gollum-lib in the method @sirtoobii refers to, we'd be happy to help out with a PR for that!

An easier way of adding some advanced search functionality might be to have a "Regular expression" checkbox under/next to the search bar, and treat the user-inputted search string as a regex if and only if that's set to true. Again, happy to accept and help out with a PR for this!

@oetiker
Copy link

oetiker commented Jun 15, 2022

If I am searching a needle in a haystack, it does not help me to get more hay in response to me providing more detailed search information.

If I enter red fish I would expect to get documents with the words red and fish in them. Instead I get documents with either ... is there use case for that ?

Would you be interessted in getting a patch to have this fixed?

@dometto
Copy link
Member

dometto commented Jun 15, 2022

@oetiker as it stands, you can search for "red fish" and get lines that match both or red fish and get lines that match either. I believe this is fairly standard search practice, though it would certainly be nice to have a little help function for the search bar that makes this explicit.

I took it @sirtoobii was about the possibility of more advanced pattern matching, which would require implementing one of the two solutions I indicated.

@dometto
Copy link
Member

dometto commented Jun 15, 2022

Just realized I maybe got @oetiker wrong and the suggestion is to return only pages (not lines) that contain both red and fish. This would certainly be a nice feature, but I would hesitate to say that it makes sense as a default: if I search for two IP addresses, say, I might still be interested in pages that contain only one of the two. Or I might expect to be given pages that include lines that match both expressions.

We're certainly interested in PRs that improve the existing search functionality, but at least these three questions need to be take into consideration:

  1. Should the user-supplied string by default be taken as intending to be matched on a page or line (current behavior) level?
  2. Should A B by default be taken as a disjunction or as a conjunction?
  3. What ways should the user be given to override these defaults, and/or to supply regular expressions themselves?

@oetiker
Copy link

oetiker commented Jun 15, 2022

How about behaving somewhat like google does ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants