Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability of filtering operations #30

Open
thclark opened this issue Jan 17, 2020 · 1 comment
Open

Scalability of filtering operations #30

thclark opened this issue Jan 17, 2020 · 1 comment

Comments

@thclark
Copy link

thclark commented Jan 17, 2020

Description

Say we filter the queryset as shown in the docs

Restaurant.objects.filter(
    name__startswith='Pizza'
).query_string_search(
    'name:Hut'
)

Is that a scalable solution? I suppose I'm asking whether:

  • zombodb in effect evaluates that filtered queryset, then sends a list of ids to ElasticSearch in order to filter the potential results (not scalable, as the size of the search query then grows with the number of Pizza*s in the database), or
  • does some magic (which in my mind is like creating an effective additional column of relevance which the normal filtered query is then ordered by) such that neither the search query size nor the index complexity increases with the number of things that you filter/exclude?

I'm a bit more used to django-haystack, where in order to achieve this kind of filtering scalably you'd have to have a thing you want to filter against in the search index itself. Excited by the potential of zombodb but needed to check this!

Suggestion

Please could the documentation here include a slightly more in depth note about how those filters are achieved and if it's scalable?

@fjsj
Copy link
Member

fjsj commented Jan 21, 2020

Hi @thclark, thanks for the issue.

zombodb in effect evaluates that filtered queryset, then sends a list of ids to ElasticSearch in order to filter the potential results

In fact it's the opposite. The 'name:Hut' search is executed on ElasticSearch side. Then the results are filtered with the additional SQL filters (WHERE). Note results can be limited to avoid heavy ES searches.

Haystack and basically every other search tool I've checked will suffer from similar problems: you need a list of ids to combine SQL filtering with searching (on a separate Search Engine).

But be aware nothing prevents you to ensure you're using only ElasticSearch for searches and completely avoid the use of SQL WHERE / .filter. In fact, that's recommended per docs:

It’s fine to call filter/exclude/etc. before and after search. If possible, the best would be using only a Elasticsearch query.

For that, you just need to use filter everything with the ES syntax. Try the dsl_search method.

However, I agree that's not clear enough, so I think we should separate that into a new warning explaning better what's going on behind the scenes. I'll leave this issue open due to that.

I'ld personally suggest you trying django-zombodb if you already have the infrastructure to support it. Be aware of zombodb's limitations though: https://github.com/zombodb/zombodb/blob/master/THINGS-TO-KNOW.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants