-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index AnVIL dataset description from DUOS #5547
Comments
@hannes-ucsc: "This is dataset, not project. For the latter see #4827. This is an unusual request because we will typically obtain metadata from a Big Query table, but here we will obtain it from TDR's REST API. The REST API has given us some grievance performance-wise in the past so I expect some complications involving retries and time-boxing of requests." |
Spike for design and estimate. |
My first stab at this was much more complicated but I realized that we can handle with a special bundle type, similarly to how we handle supplementary files. We'll assume that the description is only needed in outer entities of the In the Assuming the endpoint ends up being the one tentatively given to us by the Broad (see previous paragraph), there is the underlying assumption that there is only one dataset row per snapshot. This means that when fetching the dataset description, the |
We need the exact specification of the endpoint that we should use and with what arguments. If it is the endpoint tentatively mentioned on Slack: I believe we're already hitting that endpoint for a different purpose and we're currently experiencing degraded performance so I would like some assertion from the Broad that the performance issue has been addressed before we start implementing this. |
Here is a spreadsheet that identifies the available information. @NoopDog Which of these fields are the priority for the Data Browser to display? |
Assigning to Dave to identify the high priority fields. |
@hannes-ucsc @NoopDog Please hold on this... I found out today that they are looking at an alternate path for handing off this data. The long term plan is to hand off this data via DUOS and they think that may be available in a few weeks. They will be getting us documentation on the DUOS interface (which is under development). Hannes, Let's discuss this. I'm going to move this back to |
Nate provided the following information on Friday, Oct 13. Please review and followup as needed.
|
@hannes-ucsc to figure out next steps. |
For demo, show new |
We currently don't have a method to receive dataset descriptions from Terra.
There is a brief discussion of this topic with Michael Baumann in Slack's #ucsc-anvil-explorer-collab channel.
The proposed TDR API change is a new endpoint /api/repository/v1/snapshots/{id} that would return JSON that included the dataset information. They have committed to supplying the JSON structure by early next week (preferably by the end of this week).
We will ingest the description at the time of indexing.
I'm looking for an estimate of the effort to accomplish this.
[Edit: A refined plan was shared in the Sep 19 Broad/UCSC standup. Updating the description to reflect that plan.]
The text was updated successfully, but these errors were encountered: