Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remediate fileset duplication in items list #371

Merged
merged 5 commits into from
Sep 25, 2024
Merged

Conversation

laritakr
Copy link
Contributor

@laritakr laritakr commented Sep 25, 2024

Story

Refs scientist-softserv/adventist_knapsack#817

IiifPrint previously updated member_ids to include the child work's filesets. After Valkyrization, this resulted in the inclusion of both the pages & child works in the items list. By removing the override of member_ids and indexing descendent_member_ids_ssim, we now separated the inclusion of the descendent's filesets to only those situations where the behavior is desired.

Additionally, searching from the parent work was broken because the PDF files do not have text extraction and were returning invalid results.

Expected Behavior Before Changes

  • Child works & child filesets both show in items list.
  • Universal Viewer text search did not work at the parent work level.

Expected Behavior After Changes

  • Only child works show in items list.
  • Universal viewer shows each page only once.
  • Universal viewer search works from either parent or child works.

Screenshots / Video

Before

Items List

Screenshot 2024-09-25 at 1 16 27 PM

After

Items List

Screenshot 2024-09-25 at 1 22 31 PM

Search for word that exists

Screenshot 2024-09-25 at 1 18 50 PM

Search for word that doesn't exist

Screenshot 2024-09-25 at 1 17 39 PM

Notes

Fix requires a reindex of works to redefine the indexed value of member_ids_ssim and to index descendent_member_ids_ssim instead. Prior to a reindex, the items list will continue to contain both child works and their filesets.

Member_ids will no longer include the lineage_service method.
Search was broken in cases where there is no json file to search (i.e.
the pdf file didn't have the json file). There may have been a change in
a blacklight_iiif_search version update, which made the INVALID text
that we used in this situation lock up the search.

This change returns the normal search behavior.
@laritakr laritakr changed the title I817 fileset duplication Remediate fileset duplication in items list Sep 25, 2024
We added descendent_member_ids_ssim to the index, but want
member_ids_ssim as a fallback. This use was missed previously.
end
# We must convert these to strings as Valkyrie's identifiers will be cast to hashes when we
# attempt to write the SolrDocument.
file_set_ids.flatten.uniq.compact.map(&:to_s)
child_ids.flatten.compact.map(&:to_s).uniq
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.uniq was moved, because this can contain a combination of valkyrie ids and string ids before the mapping to_s and some duplications could have been missed.

Array.wrap(all_member_ids).each do |id|
return true if file_type_and_permissions_valid?(member_presenters_for([id]).first)
end
false
end

def load_file_set_ids(solr_doc)
Copy link
Contributor

@ShanaLMoore ShanaLMoore Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking, but we could extract this method into a module since I think it's the same one declared in app/presenters/iiif_print/iiif_manifest_presenter_factory_decorator.rb#L25

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed... I changed an existing method which was also duplicated. To save time waiting for another build now, maybe we can do it when we do the other ticket about going directly to the first page found?

@laritakr laritakr merged commit 02f68fc into main Sep 25, 2024
9 checks passed
@laritakr laritakr deleted the i817-fileset-duplication branch September 25, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants