Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend,prepro): use etag to reduce database load #2768

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

corneliusroemer
Copy link
Contributor

@corneliusroemer corneliusroemer commented Sep 11, 2024

relates to #2092

preview URL: https://etag-prepro.loculus.org

Summary

Send etag to /extract-unprocessed-sequences to reduce db/backend load

Screenshot

Big reduction in db and backend load, approximately to 1/4 to 1/3 of the previous base load

Brave Browser 2024-09-12 03 18 16

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

Copy link
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only two places I am a bit unsure about.

Also does this mean that the database query in fetchUnprocessedEntriesAndUpdateToInProcessing was just really inefficient? Because the main change that this PR introduces is that if there has not been a change in the database we do not perform this complicated select entries in sequence_entries where ... not in preprocessing and do not stream an empty response.

Base automatically changed from table-update-tracker to main September 17, 2024 18:24
@anna-parker anna-parker removed the preview Triggers a deployment to argocd label Sep 18, 2024
@@ -135,6 +134,7 @@ open class SubmissionController(
),
],
)
@ApiResponse(responseCode = "304", description = "Not Modified")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets maybe add a test for this - we can basically just copy the test we had for get-released-data?

@anna-parker anna-parker self-requested a review September 19, 2024 11:17
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Sep 19, 2024
co-authored with @anna-parker and Claude Sonnet

try

now it might work?

Remove views

fixup

Change to a conditional update for useNewerProcessingPipelineIfPossible, add last_updated_db as a view. (#2741)

Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com>

Still return current newest version

Return version from correct place

Don't issue delete statement unless there's something to delete

fix ktlint

Adjust useNewerProcessingPipelineIfPossible to only update table when there are changes

Table update tracker anya (#2764)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

---------

Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>

CR edits to table-update-tracker (#2767)

* Add option to get-released-data to check last update time.

* make name better

* Make or short-circuiting

* Only work with table_update_tracker - return lastTime in LAST_MODIFIED header

* Remove unused functions

* add back missing imports - somehow format removed this???

* Update bash scripts

* Attempt to fix bash

* 2nd little fix

* echo value of last_modified not name

* Do not update SILO if loculus responds with 304.

* Always echo $lastSnapshot unless new data existed in loculus AND this was successfully sent to LAPIS - then echo $last_modified

* save last snapshot to file

* Update with loculus data anyways if longer than 1hour since last update

* fix

* Improve silo import scripts etc

* add back BACKEND_BASE_URL

* Log headers

* Regularize

* Better logs

* Try to fix error

* return 0 and never null in header

* Fix read non-existent file bug

* Better wrapper logging

* fix delete after update snapshot

* Make grep case insensitive

* add back removed exit 0

* Improvement to logging

* Add test

* Add full process to test

* Update documentation

* Use etag instead of modified-since for spec compliance

* ktlin fix

* Move logic into model

* Fix etag formatting

* Fix etag yet again

---------

Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com>

refactor(backend): streamline control flow

refactor(backend): extract function

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <92720311+fengelniederhammer@users.noreply.github.com>

Update backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt

Co-authored-by: Fabian Engelniederhammer <92720311+fengelniederhammer@users.noreply.github.com>

Update backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt

Co-authored-by: Fabian Engelniederhammer <92720311+fengelniederhammer@users.noreply.github.com>

Enable etag for /extract-unprocessed-data

Use etag in prepro to skip preprocessing

match case refactor

Try fix

fix logic

fix! i'm stupid

Force refresh every hour

Add response code annotations

Update preprocessing/nextclade/src/loculus_preprocessing/prepro.py

Co-authored-by: Anna (Anya) Parker <50943381+anna-parker@users.noreply.github.com>

Add etag to dummy preprocessing

# Conflicts:
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
#	backend/src/main/kotlin/org/loculus/backend/controller/SubmissionControllerDescriptions.kt
#	backend/src/main/kotlin/org/loculus/backend/model/ReleasedDataModel.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
#	backend/src/main/kotlin/org/loculus/backend/service/submission/UpdateTrackerTable.kt
#	backend/src/main/resources/db/migration/V1.2__add_table_update_tracker.sql
#	backend/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
#	kubernetes/loculus/silo_import_job.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants