-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunked package lists #1035
Chunked package lists #1035
Conversation
5fccf30
to
6c13282
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Around 70% reviewed, current comments so far
django/thunderstore/repository/migrations/0053_schedule_chunked_package_caching.py
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1035 +/- ##
==========================================
+ Coverage 92.95% 92.96% +0.01%
==========================================
Files 314 315 +1
Lines 9070 9172 +102
Branches 806 818 +12
==========================================
+ Hits 8431 8527 +96
- Misses 530 533 +3
- Partials 109 112 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good enough to me, nice work
try: | ||
return cls.objects.filter(community=community.pk).latest() | ||
except APIV1ChunkedPackageCache.DoesNotExist: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cls.objects.filter(community=community.pk).order_by("-created_at").first()
would've avoided the trouble of having to try/catch + model meta fields but this works too
listing_ids = get_package_listing_queryset(community.identifier).values_list( | ||
"id", | ||
flat=True, | ||
) | ||
listing_ref = PackageListing.objects.filter(pk=models.OuterRef("pk")) | ||
|
||
return order_package_listing_queryset( | ||
PackageListing.objects.filter(id__in=listing_ids) | ||
.select_related("community", "package", "package__owner") | ||
.prefetch_related("categories", "community__sites", "package__versions") | ||
.annotate( | ||
_rating_score=models.Subquery( | ||
listing_ref.annotate( | ||
ratings=models.Count("package__package_ratings"), | ||
).values("ratings"), | ||
), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be structured to be an enumerator that iteraters over the packages in chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MythicManiac Added a new commit 8bc950e for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp it seems that reusing some methods means we're doing some unnecessary joins and stuff, I'll clear those out. Done.
Rebased and fixed endpoint URL definition after #1042 was merged. |
b97e83d
to
8bc950e
Compare
PackageListing list for Lethal Company has grown so large, that the download timeouts for people with slow connections. Furthermore sooner or later we will go over the limit of data we can pass to mod manager's plugin layer in one go. To solve these issues, the package list will be downloaded and processed in chunks. API will provide an index file which contains URLs to the chunks of package info. All data is served in gzip compressed blob files so CDN can cache it efficiently.
New cache files are created at the top of the hour and the stale cache is dropped. Note that currently the latter is not implemented, since we have to figure out how to safely delete blob files from main and mirror storages without ending up in situations where the file is deleted from one but still exists in the other due to some error.
- Use ForeignKey instead of OneToOneField for index, since the same blob will be shared between different versions of the cache if the contents haven't changed - Add CACHE_CUTOFF_HOURS and UNCOMPRESSED_CHUNK_LIMIT as class constants and use the latter as a default parameter for update_for_community(). This makes it easier to write unit tests - Fix update_for_community() to not create empty first chunk if the data from a single PackageListing would exceed the chunk size limit - Fix update_for_community() to create a chunk (containing an empty list) if the community has no packages at all - Fix get_latest_for_community() to return None instead of throwing if community has no cache - Use fixed mtime parameter when calling gzip.compress() to ensure that same input bytes result in same output bytes, which is important since the hash of the bytes is used as the cache key - Add get_blob_content() helper to avoid repetition in unit tests
Reduce the amount of data Django's ORM keeps in memory at the same time by processing the PackageListings in smaller, consecutive QuerySets.
No description provided.