-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support reingestion of a published dataset #7443
base: feature/schema-5-3
Are you sure you want to change the base?
Conversation
Deployment Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing is that there are a couple methods which are not used anywhere, other than that minor naming/ error message things.
backend/layers/business/business.py
Outdated
def is_public_uri(self, uri): | ||
return str(uri).startswith(CorporaConfig().dataset_assets_base_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially confused by the naming of this function. I don't know if "public" is the correct term here, since we are really just checking if it's an asset we already ingested.
E.g. I think a dataset could be somewhere totally different that is publicly accessible.
Maybe is_already_ingested
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go for that.
def check_artifact_is_part_of_dataset(self, dataset_version_id: DatasetVersionId, artifact_id: DatasetArtifactId): | ||
dataset_version = self.datasets_versions[dataset_version_id.id] | ||
return any(artifact.id == artifact_id for artifact in dataset_version.artifacts) | ||
|
||
def get_artifact_by_uri_suffix(self, uri_suffix: str) -> Optional[DatasetArtifact]: | ||
for artifact in self.dataset_artifacts.values(): | ||
if artifact.uri.endswith(uri_suffix): | ||
return artifact | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think either of these methods are called anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for the other definitions of these methods too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are called when we run the backend tests not in integration mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a few places where the unit tests fork and either use DatabaseProviderMock
, or DatabaseProvider
, for example here
if self.run_as_integration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry for the confusion.
My point here is not about the class. It is that I do not see these methods (the implementation here, or from any other class) being called anywhere. Like, if I do a search of the full codebase for the string "check_artifact_is_part_of_dataset"
only the definitions show up. So these methods are currently dead code branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right, they are part of my next PR
with self.assertRaises(InvalidIngestionManifestException): | ||
self.business_logic.ingest_dataset(revision.version_id, url, None, dataset_version.version_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional, but I think two meaningful improvements that could be made to these tests are:
- Checking text of the error to make sure the correct helpful message is given
- Ideally I would check that this message is actually returned in the response, e.g. make sure the curator can actually see the helpful error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing is the methods which I believe are not being called.
I think what happened here is that these were meant to go with another batch of changes, and were accidentally included in this PR.
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## feature/schema-5-3 #7443 +/- ##
=====================================================
Coverage ? 93.07%
=====================================================
Files ? 196
Lines ? 16791
Branches ? 0
=====================================================
Hits ? 15628
Misses ? 1163
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Reason for Change
Changes
Testing steps
Checklist 🛎️
Notes for Reviewer