Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked package lists #1035

Merged
merged 6 commits into from
Sep 19, 2024
Merged

Chunked package lists #1035

merged 6 commits into from
Sep 19, 2024

Conversation

anttimaki
Copy link
Collaborator

No description provided.

@anttimaki anttimaki force-pushed the chunky branch 2 times, most recently from 5fccf30 to 6c13282 Compare May 21, 2024 13:31
Copy link
Member

@MythicManiac MythicManiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Around 70% reviewed, current comments so far

Copy link

codecov bot commented Jun 14, 2024

Codecov Report

Attention: Patch coverage is 95.50562% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.96%. Comparing base (142abd8) to head (c212b8b).
Report is 14 commits behind head on master.

Current head c212b8b differs from pull request most recent head c8675d1

Please upload reports for the commit c8675d1 to get more accurate results.

Files Patch % Lines
django/thunderstore/repository/models/cache.py 95.23% 1 Missing and 2 partials ⚠️
django/thunderstore/repository/tasks/caches.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
+ Coverage   92.95%   92.96%   +0.01%     
==========================================
  Files         314      315       +1     
  Lines        9070     9172     +102     
  Branches      806      818      +12     
==========================================
+ Hits         8431     8527      +96     
- Misses        530      533       +3     
- Partials      109      112       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@MythicManiac MythicManiac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough to me, nice work

Comment on lines +151 to +155
try:
return cls.objects.filter(community=community.pk).latest()
except APIV1ChunkedPackageCache.DoesNotExist:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return cls.objects.filter(community=community.pk).order_by("-created_at").first()

would've avoided the trouble of having to try/catch + model meta fields but this works too

Comment on lines 234 to 271
listing_ids = get_package_listing_queryset(community.identifier).values_list(
"id",
flat=True,
)
listing_ref = PackageListing.objects.filter(pk=models.OuterRef("pk"))

return order_package_listing_queryset(
PackageListing.objects.filter(id__in=listing_ids)
.select_related("community", "package", "package__owner")
.prefetch_related("categories", "community__sites", "package__versions")
.annotate(
_rating_score=models.Subquery(
listing_ref.annotate(
ratings=models.Count("package__package_ratings"),
).values("ratings"),
),
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be structured to be an enumerator that iteraters over the packages in chunks

Copy link
Collaborator Author

@anttimaki anttimaki Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MythicManiac Added a new commit 8bc950e for this.

Copy link
Collaborator Author

@anttimaki anttimaki Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp it seems that reusing some methods means we're doing some unnecessary joins and stuff, I'll clear those out. Done.

@anttimaki
Copy link
Collaborator Author

Rebased and fixed endpoint URL definition after #1042 was merged.

@anttimaki anttimaki force-pushed the chunky branch 2 times, most recently from b97e83d to 8bc950e Compare August 20, 2024 11:56
PackageListing list for Lethal Company has grown so large, that the
download timeouts for people with slow connections. Furthermore sooner
or later we will go over the limit of data we can pass to mod manager's
plugin layer in one go.

To solve these issues, the package list will be downloaded and
processed in chunks. API will provide an index file which contains URLs
to the chunks of package info. All data is served in gzip compressed
blob files so CDN can cache it efficiently.
New cache files are created at the top of the hour and the stale cache
is dropped. Note that currently the latter is not implemented, since we
have to figure out how to safely delete blob files from main and mirror
storages without ending up in situations where the file is deleted from
one but still exists in the other due to some error.
- Use ForeignKey instead of OneToOneField for index, since the same
  blob will be shared between different versions of the cache if the
  contents haven't changed
- Add CACHE_CUTOFF_HOURS and UNCOMPRESSED_CHUNK_LIMIT as class
  constants and use the latter as a default parameter for
  update_for_community(). This makes it easier to write unit tests
- Fix update_for_community() to not create empty first chunk if the
  data from a single PackageListing would exceed the chunk size limit
- Fix update_for_community() to create a chunk (containing an empty
  list) if the community has no packages at all
- Fix get_latest_for_community() to return None instead of throwing if
  community has no cache
- Use fixed mtime parameter when calling gzip.compress() to ensure that
  same input bytes result in same output bytes, which is important
  since the hash of the bytes is used as the cache key
- Add get_blob_content() helper to avoid repetition in unit tests
Reduce the amount of data Django's ORM keeps in memory at the same time
by processing the PackageListings in smaller, consecutive QuerySets.
@anttimaki anttimaki changed the title WIP Draft of caching for chunked package lists Chunked package lists Sep 3, 2024
@anttimaki anttimaki marked this pull request as ready for review September 3, 2024 07:14
@MythicManiac MythicManiac merged commit 73385c9 into master Sep 19, 2024
24 checks passed
@MythicManiac MythicManiac deleted the chunky branch September 19, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants