Chunked package lists #1035

anttimaki · 2024-05-14T16:05:54Z

No description provided.

django/thunderstore/repository/models/chunky.py

django/thunderstore/repository/api/v1/tasks.py

django/thunderstore/repository/api/v1/urls.py

django/thunderstore/repository/models/cache.py

MythicManiac

Around 70% reviewed, current comments so far

django/thunderstore/repository/api/v1/tasks.py

django/thunderstore/repository/migrations/0053_schedule_chunked_package_caching.py

django/thunderstore/repository/api/v1/views/listing_index.py

django/thunderstore/repository/api/v1/urls.py

codecov · 2024-06-14T13:29:10Z

Codecov Report

Attention: Patch coverage is 95.50562% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.96%. Comparing base (142abd8) to head (c212b8b).
Report is 14 commits behind head on master.

❗ Current head c212b8b differs from pull request most recent head c8675d1

Please upload reports for the commit c8675d1 to get more accurate results.

Files	Patch %	Lines
django/thunderstore/repository/models/cache.py	95.23%	1 Missing and 2 partials ⚠️
django/thunderstore/repository/tasks/caches.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
+ Coverage   92.95%   92.96%   +0.01%     
==========================================
  Files         314      315       +1     
  Lines        9070     9172     +102     
  Branches      806      818      +12     
==========================================
+ Hits         8431     8527      +96     
- Misses        530      533       +3     
- Partials      109      112       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MythicManiac

Looks good enough to me, nice work

MythicManiac · 2024-06-20T11:40:03Z

django/thunderstore/repository/models/cache.py

+        try:
+            return cls.objects.filter(community=community.pk).latest()
+        except APIV1ChunkedPackageCache.DoesNotExist:
+            return None


return cls.objects.filter(community=community.pk).order_by("-created_at").first()

would've avoided the trouble of having to try/catch + model meta fields but this works too

MythicManiac · 2024-06-20T12:17:12Z

django/thunderstore/repository/models/cache.py

+    listing_ids = get_package_listing_queryset(community.identifier).values_list(
+        "id",
+        flat=True,
+    )
+    listing_ref = PackageListing.objects.filter(pk=models.OuterRef("pk"))
+
+    return order_package_listing_queryset(
+        PackageListing.objects.filter(id__in=listing_ids)
+        .select_related("community", "package", "package__owner")
+        .prefetch_related("categories", "community__sites", "package__versions")
+        .annotate(
+            _rating_score=models.Subquery(
+                listing_ref.annotate(
+                    ratings=models.Count("package__package_ratings"),
+                ).values("ratings"),
+            ),
+        ),
+    )


This could be structured to be an enumerator that iteraters over the packages in chunks

@MythicManiac Added a new commit 8bc950e for this.

~~Welp it seems that reusing some methods means we're doing some unnecessary joins and stuff, I'll clear those out.~~ Done.

anttimaki · 2024-06-25T13:17:18Z

Rebased and fixed endpoint URL definition after #1042 was merged.

PackageListing list for Lethal Company has grown so large, that the download timeouts for people with slow connections. Furthermore sooner or later we will go over the limit of data we can pass to mod manager's plugin layer in one go. To solve these issues, the package list will be downloaded and processed in chunks. API will provide an index file which contains URLs to the chunks of package info. All data is served in gzip compressed blob files so CDN can cache it efficiently.

New cache files are created at the top of the hour and the stale cache is dropped. Note that currently the latter is not implemented, since we have to figure out how to safely delete blob files from main and mirror storages without ending up in situations where the file is deleted from one but still exists in the other due to some error.

- Use ForeignKey instead of OneToOneField for index, since the same blob will be shared between different versions of the cache if the contents haven't changed - Add CACHE_CUTOFF_HOURS and UNCOMPRESSED_CHUNK_LIMIT as class constants and use the latter as a default parameter for update_for_community(). This makes it easier to write unit tests - Fix update_for_community() to not create empty first chunk if the data from a single PackageListing would exceed the chunk size limit - Fix update_for_community() to create a chunk (containing an empty list) if the community has no packages at all - Fix get_latest_for_community() to return None instead of throwing if community has no cache - Use fixed mtime parameter when calling gzip.compress() to ensure that same input bytes result in same output bytes, which is important since the hash of the bytes is used as the cache key - Add get_blob_content() helper to avoid repetition in unit tests

Reduce the amount of data Django's ORM keeps in memory at the same time by processing the PackageListings in smaller, consecutive QuerySets.

anttimaki assigned MythicManiac May 14, 2024

MythicManiac reviewed May 15, 2024

View reviewed changes

django/thunderstore/repository/models/chunky.py Outdated Show resolved Hide resolved

anttimaki force-pushed the chunky branch 2 times, most recently from 5fccf30 to 6c13282 Compare May 21, 2024 13:31