-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[don't merge without prior discussion] fix: don't sort author lists, except for segment grouping purposes #2653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work - looks good to me. I don't think we should merge this until tomorrow at the point that we're ready to start looking into bringing this to prod, because once we do we block on bringing in other new things (if something even more urgent were to come up in the intervening time).
Thanks for the review @theosanderson, I've addressed your points - I won't merge then, let's coordinate tomorrow on how to phase this into production! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing this @corneliusroemer!
One suggestion
|
ROLLBACK;
DO $$
DECLARE
acc_exists BOOLEAN;
BEGIN
-- Check if any accession exists with the given conditions
SELECT EXISTS (
SELECT 1
FROM sequence_entries
WHERE submitter = 'insdc_ingest_user' AND version > 2
) INTO acc_exists;
-- If such an accession exists, rollback the transaction
IF acc_exists THEN
RAISE NOTICE 'Condition met: Rolling back transaction.';
-- Rollback the transaction; this will end the transaction block
RAISE EXCEPTION 'Transaction rolled back due to condition.';
ELSE
-- Create a temporary table for use in the following operations
CREATE TEMPORARY TABLE accessions_to_downgrade AS
SELECT accession
FROM sequence_entries
WHERE submitter = 'insdc_ingest_user' AND version = 2;
-- Delete entries with version 1 that have accessions to downgrade
DELETE FROM sequence_entries
WHERE accession IN (SELECT accession FROM accessions_to_downgrade) AND version = 1;
-- Update version 2 to version 1 for the relevant accessions
UPDATE sequence_entries
SET version = 1
WHERE submitter = 'insdc_ingest_user' AND version = 2;
-- Delete preprocessed data entries with version 1 for the relevant accessions
DELETE FROM sequence_entries_preprocessed_data
WHERE accession IN (SELECT accession FROM accessions_to_downgrade) AND version = 1;
-- Update version 2 to version 1 in preprocessed data
UPDATE sequence_entries_preprocessed_data
SET version = 1
WHERE accession IN (SELECT accession FROM accessions_to_downgrade) AND version = 2;
-- Drop the temporary table
DROP TABLE accessions_to_downgrade;
RAISE NOTICE 'There are only v1 and v2 - can continue.';
-- No explicit COMMIT needed as the DO block will commit if no exceptions are raised
END IF;
END $$;
|
d28a50d
to
1567629
Compare
1567629
to
7ce161f
Compare
7ce161f
to
3da1af2
Compare
Yes, that is mysterious |
Yeah it's odd, but I can't see why author order would have this effect? |
uggh... #2837 (which has the same code changes as this PR) has the exact same duplication in west nile (and only in west nile) ... so I do think it is this PR... |
|
I rebased to see if that had an impact - now https://ing-upd.loculus.org/ has a duplication in cchf - but yes not in west nile anymore |
But maybe this is more linked to our ingest deployment? Is there any chance ingest pods can run concurrently? I'm uncertain why this would only show up in this PR... maybe now ingest takes longer? |
Lets maybe not sort on keys? e.g. 1d833f9 - adding the sort makes ingest slower and we want to minimize time between the get-original-metadata call and the submit calls. |
@anna-parker have you measured how much this changes? I'd be very surprised if the sort takes more than a second. Sorting is fast, this is a small dataset. |
Yeah its not a big difference, e.g. for west nile
But found it suspicious we saw this for the organisms with the most data |
resolves #2650
preview URL: https://author-order.loculus.org
Summary
We started sorting author lists when it turned out that different segments of the same isolate could have different author orders. We should have done the sorting only for grouping purposes and not when submitting to Loculus.
This PR fixes that by only sorting for grouping purposes.
Note: this will cause ingest to submit revisions for existing deployments as the ingest hash changes.
Screenshot