Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse any deleted resource IDs from bulk exports and act on it #344

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Aug 30, 2024

  • Updates Loader classes to return any information found about deleted resources
  • Updates the Delta Lake formatter to use that info to delete rows
  • Updates the ndjson formatter to preserve the deleted metadata on disk
  • Updates convert logic to read that preserved metadata and then carry that into the Delta Lake

As a reminder, the spec's commentary on how to handle deleted is in the bulk export section.

Fixes #167

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

@mikix mikix changed the title WIP: Have loaders return a results object with bundled data WIP: Parse any deleted resource IDs from bulk exports and act on it Aug 30, 2024
- It used to return common.Directory (which could be a TempDir)
- It now includes a common.Directory plus completion tracking info
  like group name and export timestamp.
- It will in future include metadata like a list of deleted IDs
Copy link

github-actions bot commented Aug 30, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3448 3386 98% 98% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
cumulus_etl/etl/cli.py 100% 🟢
cumulus_etl/etl/config.py 100% 🟢
cumulus_etl/etl/convert/cli.py 100% 🟢
cumulus_etl/etl/tasks/base.py 100% 🟢
cumulus_etl/formats/base.py 100% 🟢
cumulus_etl/formats/batched_files.py 100% 🟢
cumulus_etl/formats/deltalake.py 100% 🟢
cumulus_etl/formats/ndjson.py 100% 🟢
cumulus_etl/loaders/init.py 100% 🟢
cumulus_etl/loaders/base.py 100% 🟢
cumulus_etl/loaders/fhir/ndjson_loader.py 100% 🟢
cumulus_etl/loaders/i2b2/loader.py 100% 🟢
TOTAL 100% 🟢

updated for commit: 9a6ab95 by action🐍

- When loading using the default ndjson loader, we look for a deleted/
  folder and read any Bundle files there for deleted IDs
- And then pass that along to tasks and matching formatters
- Formatters now have a delete_records(ids) call
- If the output format is deltalake, the IDs will be deleted
@mikix mikix force-pushed the mikix/deleted-ids branch 2 times, most recently from 5ccd475 to 724ea10 Compare September 4, 2024 17:49
@mikix mikix changed the title WIP: Parse any deleted resource IDs from bulk exports and act on it Parse any deleted resource IDs from bulk exports and act on it Sep 4, 2024
@mikix mikix marked this pull request as ready for review September 4, 2024 17:49
cumulus_etl/etl/tasks/base.py Outdated Show resolved Hide resolved
cumulus_etl/loaders/fhir/ndjson_loader.py Show resolved Hide resolved
@mikix mikix merged commit 94bdf30 into main Sep 5, 2024
3 checks passed
@mikix mikix deleted the mikix/deleted-ids branch September 5, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have the bulk exporter handle delete requests
2 participants