-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Harvesting Client name to the Metadata Source facet #10464
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Asked for a release note. The tests haven't completed yet so I'm not sure if there's any Harvesting/search related test that needs to be updated. (Can one be added - do we already have harvested test datasets where we could search on this field?)
@@ -897,7 +897,8 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long | |||
|
|||
if (dataset.isHarvested()) { | |||
solrInputDocument.addField(SearchFields.IS_HARVESTED, true); | |||
solrInputDocument.addField(SearchFields.METADATA_SOURCE, HARVESTED); | |||
solrInputDocument.addField(SearchFields.METADATA_SOURCE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Probably need a release note about this feature that also notes that (async/background) reindexing is needed to populate the facet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @qqmyers 👋🏼
Thanks! I will add the release note in a few just looking at some other PR right now but I will add it ASAP. Regarding the tests, I am not sure I will also look at this.
Best,
Juan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note added and test condition added to the harvesting test to search by the new collection name. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting what I said during standup in writing:
Rather than using the HarvestingClient's nickname for this facet, we should probably use the name of the local collection into which the client is harvesting.
The clear advantages of doing that:
- This mirrors what we are doing for the local datasets:
solrInputDocument.addField(SearchFields.METADATA_SOURCE, rootDataverse.getName());
- The name of the collection is likely to be more descriptive/better-looking to a human user
- While both the client nickname and the name of the local collection are chosen by the local admin, it is far easier to change the latter. The former is not editable at all. With the current implementation, if a prod. instance admin realizes that they named the harvesting client
oai_3
and that's what's going to show up in the facet, the only way for them to address it would be to delete the client (and all the content associated with it), and re-create it with a better-looking nickname, then re-harvest. - This will make it unnecessary to add an extra field with a descriptive label to the client class (as was mentioned during standup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if someone has multiple clients harvested into the same collection? 🤔 should we consider this scenario! Also @DS-INRA made some comments a few minutes ago on the issue confirming that the name would be what they need which is as I understand is the other PR associated with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to harvest into the same collection, yes. But then one could make an argument that if the local admin wants their users to see these datasets from different OAI archives (or sets/clients) show as the same collection to their users, they may actually prefer to have them under the same facet too...
All that said, I would agree that it makes sense to implement it the way the original requestor wants it to work. But let's make sure everybody is on the same page. Let me ask some followup questions in the issue.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
(I'm generally ready to merge this) |
I just chatted with @jp-tosca about it. We may have talked earlier about having a setting to revert the old behavior but I think it's fine that it's absent. Onwards and upwards. One downside is that you won't be able to just look at a number of how many harvested dataset (unless there's only one client). Oh well. Maybe it will encourage more harvesting, once you can easily tell where stuff is being harvested from. Overall, seems like a good change, and @DS-INRA approves, which is the most important thing. 😄 |
@pdurbin Since we're still going to be showing the total number of local datasets, it should still be fairly clear how many harvested datasets total there are - because, math ("datasets" facets - total local); even if we are not showing the exact total number. |
We are having a meeting today at 11:00 EST with @DS-INRA, I will post if there are any updates. |
@@ -272,6 +270,12 @@ private void harvestingClientRun(boolean allowHarvestingMissingCVV) throws Inte | |||
} while (i<maxWait); | |||
|
|||
System.out.println("Waited " + i + " seconds for the harvest to complete."); | |||
|
|||
Response searchHarvestedDatasets = UtilIT.search("metadataSource:" + nickName, normalUserAPIKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep an eye on it - will not surprise me if it needs an extra second here, occasionally, for the indexing of all the harvested datasets to complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jp-tosca @stevenwinship
Hah! This appears to be exactly what's happening now, since I merged this PR yesterday.
The new, weird failures - data.total_count doesn't match. Expected: <8> Actual: <5>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JP, this was a bad call on my part - I should've asked you to run Jenkins 3 times in a row on the PR before merging it.
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
👏 Thanks for the PR ! |
What this PR does / why we need it:
This PR will change the "Metadata Source" facet. It will stop showing all the Harvested datasets inside the same category and group them by the nickname of the client. ›
Which issue(s) this PR closes: 10298
Closes #10298
Special notes for your reviewer:
We may need to update this when #10217 gets done with that value.
Suggestions on how to test this:
Create multiple harvesting clients and you should be able to test the change on the facet.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Before:
After:
![image](https://private-user-images.githubusercontent.com/142103991/319707029-48314656-ac14-4a8f-92d5-e8d41488917f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1OTg4ODUsIm5iZiI6MTczOTU5ODU4NSwicGF0aCI6Ii8xNDIxMDM5OTEvMzE5NzA3MDI5LTQ4MzE0NjU2LWFjMTQtNGE4Zi05MmQ1LWU4ZDQxNDg4OTE3Zi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNVQwNTQ5NDVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00MGFiYjhiN2Q0NDJjNDM0YzFkOTQwMTQxNWEyYzBhNjRlNjM3MDg0YmI3ZWJjZTE1ODg4YjY2Zjk1MzA2MTBlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.32_62b6X7IpadaG4ZcRVcnS5OhEeEsahHKh1C38eq7o)