Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have tags default to being in creation order #871

Closed
wants to merge 15 commits into from

Conversation

steverecio
Copy link
Contributor

Fixes #870 to ensure that tag order is preserved by default

Copy link
Contributor

@rtpg rtpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the original issue I was a bit pessimistic, but looking at this I think this makes more sense than I was originally expecting.

The test failures seem to be mostly linked to tag_kwargs not being passed into get_or_create, so we should do that IMO.

The point about allowing self.ordering = [] to be explicitly set is also important IMO: sometimes people don't want stuff ordered implicitly (can be the source of nasty DB plans)

@@ -196,58 +198,49 @@ def add(self, *tags, through_defaults=None, tag_kwargs=None, **kwargs):
def _to_tag_model_instances(self, tags, tag_kwargs):
"""
Takes an iterable containing either strings, tag objects, or a mixture
of both and returns set of tag objects.
of both and returns a list of tag objects while preserving order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here this preserves the order of creation, right? But if someone has ordering set to name should this actually order by name instead?

Basically shouldn't we try to keep the ordering consistent across the board? This might be pretty straightforward to do in one query.

I'm not a super fan of this method already (kinda hard to get what it's trying to accomplish) so any sort of simplification would definitely be appreciated.

Copy link
Contributor Author

@steverecio steverecio Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here this preserves the order of creation, right? But if someone has ordering set to name should this actually order by name instead?

This processes the tags arg in the order its passed in. It attempts to get or create the Tag object associated with each t in tags while ignoring duplicates (and handling case insensitivity).

I would argue that name (and tag creation order) is much easier to order by on the retrieval end. By preserving the order of the original list in this function, we can order by any key we want when retrieving the model (tag name, tag pk, or through table entry pk).

sample_obj = TestModel.objects.create()

sample_obj.tags.set(str_tags)
for idx, tag in enumerate(sample_obj.tags.all()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be replaced with self.assertEqual(str_tags, [tag.name for tag in sample_obj.tags.all()])

tag_objs.append(tag)
except self.through.tag_model().DoesNotExist:
tag, created = manager.get_or_create(
name=t, defaults={"name": t}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing **tag_kwargs here

@@ -69,7 +69,9 @@ def __init__(self, through, model, instance, prefetch_cache_name, ordering=None)
if ordering:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to set a default for self.ordering, let's make it so that we can still pass in ordering=[], by changing this if ordering to if ordering is not None.

That way in the changelog we can offer people to have the old behavior by passing that in explicitly

from tests.models import TestModel


class TestTaggableManager(TestCase):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this suite of tests should be in this file or tests.py

@steverecio
Copy link
Contributor Author

steverecio commented Oct 26, 2023

So as of now the primary tests that are failing are the assertNumQueries and here which seems to occur because the ordering query causes duplicates but I tried adding distinct() everywhere I could but that doesn't seem to do it. I think its somewhere in the caching logic but can't quite figure it out.

@rtpg Are we able to modify the tests to change the num_queries from 32 to 34 and 10 to 11?

We might be able to get the num queries back down by first iterating through the tags arg in _to_tag_model_instances to parse out the str tags and doing one retrieval call and then creating any that are missing from that result and reassembling but honestly that will make this code even more unreadable and wouldn't really affect the worst case O(n) runtime of the function (list of all new tags). Feel like over-engineering to me at the cost of readability (already an issue in this codebase) but I'm new here so.... 😎

@steverecio
Copy link
Contributor Author

steverecio commented Oct 26, 2023

Continuing to dig into this failure...

The issue appears to come from get_queryset. If I print the query getting executed, there is already a distinct clause but because of the through table entry on the multi-inheritance model the distinct clause is ineffective in deduplicating the query. However, we can't pass in the tag pk for deduplicating because we would have to pass in the tag pk to the order_by query for distinct() to accept that param.

Here is the query:

SELECT DISTINCT "taggit_tag"."id", "taggit_tag"."name", "taggit_tag"."slug", "taggit_taggeditem"."id" FROM "taggit_tag" INNER JOIN "taggit_taggeditem" ON ("taggit_tag"."id" = "taggit_taggeditem"."tag_id") INNER JOIN "django_content_type" ON ("taggit_taggeditem"."content_type_id" = "django_content_type"."id") WHERE ("django_content_type"."app_label" = tests AND "django_content_type"."model" = food AND "taggit_taggeditem"."object_id" = 1) ORDER BY "taggit_taggeditem"."id" ASC

UPDATE: I was able to get this passing by adding this deduplication snippet. The only remaining tests that fail are related to the asserting the number of queries executed. Are we able to loosen those test restrictions? @rtpg

@rtpg
Copy link
Contributor

rtpg commented Oct 30, 2023

SELECT DISTINCT "taggit_tag"."id", "tests_taggedcustompk"."id" FROM "taggit_tag" INNER JOIN "tests_taggedcustompk" ON ("taggit_tag"."id" = "tests_taggedcustompk"."tag_id") INNER JOIN "django_content_type" ON ("tests_taggedcustompk"."content_type_id" = "django_content_type"."id") WHERE ("django_content_type"."app_label" = 'tests' AND "django_content_type"."model" = 'custompkfood') ORDER BY "tests_taggedcustompk"."id" ASC

On test_most_common_lazy, this query is getting generated. This looks like something pretty hefty (my first read is that this is going to read all of the tags of a certain type in from the database). I think we will want to figure out a way to avoid this (and somehow make this extra query lazy again, as I bet in that case the ORM will do the right thing).

I scrolled through the diff on the web just now and can't figure out exactly what line introduced this query though.

@steverecio
Copy link
Contributor Author

steverecio commented Oct 31, 2023

SELECT DISTINCT "taggit_tag"."id", "tests_taggedcustompk"."id" FROM "taggit_tag" INNER JOIN "tests_taggedcustompk" ON ("taggit_tag"."id" = "tests_taggedcustompk"."tag_id") INNER JOIN "django_content_type" ON ("tests_taggedcustompk"."content_type_id" = "django_content_type"."id") WHERE ("django_content_type"."app_label" = 'tests' AND "django_content_type"."model" = 'custompkfood') ORDER BY "tests_taggedcustompk"."id" ASC

On test_most_common_lazy, this query is getting generated. This looks like something pretty hefty (my first read is that this is going to read all of the tags of a certain type in from the database). I think we will want to figure out a way to avoid this (and somehow make this extra query lazy again, as I bet in that case the ORM will do the right thing).

I scrolled through the diff on the web just now and can't figure out exactly what line introduced this query though.

This is related to the new get_queryset function: https://github.com/jazzband/django-taggit/pull/871/files#diff-2f4b79363da4c4d9048dbf680784ffbdee2400138f9672c19e2e7e73e59093d0R89

I have to execute the queryset (via values_list) in order to dedupe it because the distinct() function doesn't work with the ordering query attached when you query the tags on the model itself.

We are ensuring tags are a unique set when we tag an individual entry in the table (via _to_tag_model_instances). So if we order by the through table pk, we won't get any duplicates when querying a model entry. However, if we retrieve the tags for the entire table, the distinct clause sees the through table values from the order by clause and doesn't effectively deduplicate. (The issue lies in using the same get_queryset function for both obj.tags.all() and MyModel.tags.all())

For example, without the deduplication logic, the following will be true:

apple = Food.objects.create(name='apple')
apple.tags.set(['green', 'green'])
assertTagsEqual(apple.tags.all(), ['green'])
assertTagsEqual(Food.tags.all(), ['green'])

grape = Food.objects.create(name='grape')
grape.tags.add('green')
assertTagsEqual(Food.tags.all(), ['green', 'green'])

Does that make sense? One possible solution is that when the manager is attached to a model entry, we enforce tag ordering by the through table pk and when its attached to the model itself we don't enforce any ordering. Honestly, I'm not sure what the best solution for this is but I think I understand the underlying issue at the very least.

UPDATE: I was able to fix most of the tests by only adding the ordering if the instance is not None. Only remaining issue is the 34 vs 32 queries.

@steverecio
Copy link
Contributor Author

@rtpg friendly ping :)

I was able to fix most of the tests by only applying the ordering if the instance is not None. This way, the ordering is not applied automatically when querying for tags attached to a model, only for tags attached to a model instance. Only remaining issue is the 34 vs 32 queries.

@stefanivic
Copy link

@steverecio @rtpg any updates on this, we ran into the same issue so was wondering if this feature was abandoned or will not be merged due to some other reason ? I can take it over if you moved on.

@rtpg
Copy link
Contributor

rtpg commented May 29, 2024

@stefanivic could you specifically describe the issue you had, so we can design some test cases and properly say what issue we're trying to fix?

Basically, would like to fix any issues, but do not want to add any extra queries, so I think we might need to start a new branch. But would like to have a specific ask here

@stefanivic
Copy link

@stefanivic could you specifically describe the issue you had, so we can design some test cases and properly say what issue we're trying to fix?

Basically, would like to fix any issues, but do not want to add any extra queries, so I think we might need to start a new branch. But would like to have a specific ask here

Pretty much what the referenced issue and PR talk about. When adding tags, their order is not respected, instead the order seems to change during the save. For example if I add the tags 1, 3, 5, 2, 4, after the save these will be listed as 1, 2, 3, 4, 5. Based on what I have seen it seems to order them by name ( alphabetical asc ). If it's required I can make a small demo to show.

Our clients tend to define tags in the order of importance to them, so the ones they want people to click the most go on the first two positions. But after the reorder they can end up last.

I have tried adding a sort_order (int) field to the ItemBase and then returning them in the specific order, but the Set() seems to reorder them regardless.

@steverecio
Copy link
Contributor Author

@stefanivic could you specifically describe the issue you had, so we can design some test cases and properly say what issue we're trying to fix?

Basically, would like to fix any issues, but do not want to add any extra queries, so I think we might need to start a new branch. But would like to have a specific ask here

@rtpg There are some new test cases in this PR that highlight the issue.

@rtpg rtpg changed the title Fix 870 Default to tags existing in creation order Jun 2, 2024
@rtpg rtpg changed the title Default to tags existing in creation order Have tags default to being in creation order Jun 2, 2024
@rtpg
Copy link
Contributor

rtpg commented Jun 23, 2024

This one was implemented in #892

@rtpg rtpg closed this Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tag Ordering Inconsistent
3 participants