Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation Plan: Additional search views #2676

Merged
merged 7 commits into from
Aug 2, 2023

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Jul 19, 2023

Fixes

Resolves #2560 by @zackkrida

Assigned reviewers

Note

A very rough draft implementation of the additional search view pages is in branch draft/collection_pages.

Current round

Note
This discussion is following the Openverse decision-making process. Information about this process can be found
on the Openverse documentation site. Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

The discussion is currently in the Decision round.

The deadline for review of this round is 2023-07-25.

@obulat obulat added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 🕹 aspect: interface Concerns end-users' experience with the software 🧱 stack: frontend Related to the Nuxt frontend skip-changelog 🧭 project: implementation plan An implementation plan for a project labels Jul 19, 2023
@obulat obulat self-assigned this Jul 19, 2023
@obulat obulat requested a review from a team as a code owner July 19, 2023 17:37
@obulat obulat requested review from krysal and stacimc July 19, 2023 17:37
@obulat obulat changed the title Additional search views implementation plan Implementation Plan: Additional search views Jul 19, 2023
@obulat obulat requested review from zackkrida and removed request for stacimc July 19, 2023 17:37
@github-actions github-actions bot added the 🧱 stack: documentation Related to Sphinx documentation label Jul 19, 2023
@obulat obulat requested a review from sarayourfriend July 19, 2023 17:39
@obulat obulat force-pushed the project/additional_search_views_ip branch from 73637ba to c0e7f53 Compare July 19, 2023 17:55
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent, I'm only requesting changes based on the volume of cleanup suggestions I've made.

Actually, one potential blocker. I believe we need to include the necessary API changes in this plan, rather than have a separate plan for the API. Since the API changes are so small.

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Zack, but will suggest that either the API changes should be included in this IP or they should be described in a separate IP first.

Without having a deep knowledge of how the frontend stores work, and because it isn't described in this implementation plan, it's very hard to understand how the searches for these collection pages is meant to be implemented. And in the interest of keeping the API as simple as possible, I'd prefer to have the API's approach guide the frontend implementation rather than the other way around.

Comment on lines 50 to 163
Nuxt allows creating nested dynamic routes like
`/pages/_collection/_mediaType/_term`.

Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one
of the supported media types (currently, `image` and `audio`). `term` refers to
the actual value of the tag, creator or source name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fantastic! It could be useful for SEO as well. Speaking of which, what considerations do we need to apply to robots.txt for this change, if any? I don't really understand our current approach to robots.txt, is it leftover from when we were iframed? @zackkrida I believe you may know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current robots.txt is blank, which allows all pages of the site to be crawled and indexed. That works exactly the same as:

User-agent: *
Disallow:

The staging site is blocked from indexing.

I don't think we need to do anything here. These pages can be crawled and indexed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new pages are essentially search views with static URLs though (not query params). Should that kind of thing be indexed?

Sorry if this is already discussed somewhere. Might be helpful to document the rationale of our robots.txt in the docs site.

Comment on lines 104 to 226
The header should also display the number of results, "251 audio files with the
selected tag", "604 images provided by this source", "37 images by this creator
in Hirshhorn Museum and Sculpture Garden".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it at all relevant for us to consider sources that have a single creator? Boston Public Library, as a source, for example, only has Boston Public Library as the "creator" of the work. It's weird to think that there would be two separate pages for them, but then I'm not sure how we would communicate this to the user if we automatically redirected the BPL creator search to the BPL source search 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note to the plan. By the way, we don't have a Boston Public Library as a source filter on the frontend. What is the link to their API or collection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!

Comment on lines 158 to 171
#### Update the `searchBy` filter in the `search` store

Currently, this filter is shaped as other filters: it is a list of objects with
`code`, `name` and `checked` properties. This is a possible value now: `{
searchBy: { code: "creator", name: "filters.searchBy.creator", checked: true }}.

Since this filter is mutually exclusive (you cannot search by both the creator
and the tag, for example), this shape is very difficult to update.

It is better to set the `searchBy` filter to have one of the possible values:
`{ searchBy: <null|creator|source|tag> }`.

We should also create a new method, `setSearchBy`, in the `search` store, that
would allow directly set the `searchBy` filter value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify whether this is just a DevEx/code-quality refactor or if it's necessary for the new features? I don't fully understand why this particular change is relevant to the creator/tag/source pages. Are they going to use the same search store and filters? If so, that would be helpful background to have somewhere as well (may have missed it though?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I actually re-used the same search store in the draft implementation that I linked in the issue description. But to get the store to handle different searchBy values, we should add this change.
Thank you for asking this question! I realized that I didn't write anything about the data side: what API calls will be made, how the stores will be used and how they need to be updated. I'll add this during the revision round.

Comment on lines 175 to 179
I am not sure what events we want to add, if any. Should we add `<VIEW>_CLICKED`
events for the new views, or would page views be sufficient?

Should we add a new `SELECT_COLLECTION_ITEM` event similar to
`SELECT_SEARCH_RESULT`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. SELECT_SEARCH_RESULT can be narrowed to /search or /tag* (for example) to determine where the event occurred'

@obulat obulat force-pushed the project/additional_search_views_ip branch 4 times, most recently from 146dcec to 12f655a Compare July 21, 2023 07:22
@github-actions
Copy link

github-actions bot commented Jul 21, 2023

Full-stack documentation: https://docs.openverse.org/_preview/2676

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only three truly blocking issues, the rest are just additional questions or suggestions. The blockers are:

  1. The various issues I've described with using Postgres to query for the creator view must be addressed. I've suggested we continue to use ES by adding new .raw sub-fields for creator and source.
  2. The frontend routes need to disambiguate creators by source. I've suggested alternative paths that mirror my suggested path (rather than query param) based approach for the API and would solve this problem.
  3. For me to fully understand how the plan would be implemented, I need an ordered, high-level step-by-step implementation of all the tasks that would be converted into individual issues. I think we should clarify this in the implementation plan template to have something like ordered steps and step descriptions so that it's possible to get both a high-level view of the plan separated from the detailed descriptions and nuances of each step.

Overall the plan looks very good though. The first two issues are critical blockers for the plan because the plan will not succeed without addressing them. The last is just a personal blocker for me to get a proper understanding of the plan.

I'm starting AFK tomorrow and am confident that you and Zack will be able to resolve the blockers without me. The plan otherwise is great. Please consider this plan approved by me once those blockers are resolved 🚀


<!-- Describe the implementation step necessary for completion. -->

### API changes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this section. It helps a lot to understand how this will all fit together.

I like the approach to add new routes on the API rather than trying to stuff everything into the existing search endpoint. That makes things a lot nicer for the creator and provider especially.

However, there are some problems with using the database rather than ES to access anything:

  1. The database does not cache queries in the same way that ES does. Repeated queries will not necessarily be as efficient as from ES.
  2. The database does not score documents at all, so the order will different dramatically to the way that ES would order the documents. That's an issue with respect to popularity data today already, but will become even more of an issue if we start to score documents based on other metrics as theorised by our search relevancy discussions.
  3. creator is not indexed in the API database, so a query against it will be very slow.

Only one of these can be "fixed", the third, but it's much easier to add a creator.raw field to the index instead, like we've done for title, description, and tag name. Adding a new field to the index is far easier than adding a new index to the database (which would require careful application to avoid a long-running migration) and which could significantly increase the time it takes to do a data refresh.

I'd also strongly recommend against the route described as it is harder to cache in a sensible way (on the Cloudflare side of things) or to perform cache invalidation that might be required by #1969. Instead of using query strings, we can describe the resource via the path: /<media type>/source/<source>/creator/<creator> is very clean, easy to read and understand, and very easy to manage the cache for because it is a static path. The source page can use the same route by leaving off the creator. This removes the need to manage specific query params as well and would allow us to add querying within these routes more easily in the future behind the regular q parameter if we wanted.

For the tag route, I'd recommend the singular tag rather than plural tags, for legibility if we are presenting a single tag. Again, using a route-based approach instead of query parameter approach makes it easier to manage the cache if needed and could allow us to more easily feature specific tags with high quality results as example pages if we wanted, like /images/tag/dog or /images/tag/flying%20birds.

Path elements still need to be URL encoded to preserve special characters and spaces, but for most cases like /images/source/nasa, /images/sources/nypl, /images/sources/inaturalist, etc we've suddenly got a very good-looking URL to point people to.

Comment on lines 129 to 247
const prepareSearchQuery = (
searchParams: Record<string, string>,
mediaType: SupportedMediaType
) => {
if (searchParams.searchBy) {
if (searchBy === "creator") {
// the name of the source is stored in the `<mediaType>Provider` filter
return {
creator: searchTerm,
source: searchParams[`${mediaType}Provider`],
}
} else if (searchBy === "source") {
return { source: searchParams[`${mediaType}Provider`] }
} else if (searchBy === "tag") {
// The parameter is plural in the API
return { tags: searchTerm }
}
} else {
// Current search query transform
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that regardless of whether we use my fully route based approach above, we'll still need to determine the path pattern to use. Should this function also handle that, maybe returning either an array of path segments (including holes for parameters and a final object for query parameters if still needed) or the query parameter object to send to the default search route?

Comment on lines 156 to 161
Nuxt allows creating nested dynamic routes like
`/pages/_collection/_mediaType/_term`.

Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one
of the supported media types (currently, `image` and `audio`). `term` refers to
the actual value of the tag, creator or source name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realising now that for these to work, creator needs to have source included. It could make sense to make these match the API routes: /_mediaType/source/_source/creator/_creator, /_mediaType/tag/_tag etc.

Comment on lines 104 to 226
The header should also display the number of results, "251 audio files with the
selected tag", "604 images provided by this source", "37 images by this creator
in Hirshhorn Museum and Sculpture Garden".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!

@obulat obulat force-pushed the project/additional_search_views_ip branch from 1af2cce to a886fd7 Compare July 26, 2023 19:19
@obulat
Copy link
Contributor Author

obulat commented Jul 26, 2023

I've updated the plan to add the API and Elasticsearch changes. In particular, I updated the paths to match @sarayourfriend's suggestions. I also rewrote the plan to make the steps descriptions and steps sequence clearer.
@zackkrida, this is ready for re-review.

@zackkrida
Copy link
Member

I'll review this today!


<!-- What hard blockers exist which might prevent further work on this project? -->

The main blocker could be the maintainer capacity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always a potential blocker, so I don't think we need to explicitly mention it.

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in a good enough place to merge. You've done a really excellent job of outlining the steps necessary for the project and coming up with good frontend patterns for the feature.

obulat and others added 2 commits July 29, 2023 18:56
@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 9 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@zackkrida zackkrida dismissed sarayourfriend’s stale review August 2, 2023 15:37

Requested changes were made

@zackkrida
Copy link
Member

We're going to merge this on the basis of @sarayourfriend's blockers being resolved 🥳

@zackkrida zackkrida merged commit 4c99418 into main Aug 2, 2023
@zackkrida zackkrida deleted the project/additional_search_views_ip branch August 2, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕹 aspect: interface Concerns end-users' experience with the software 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧭 project: implementation plan An implementation plan for a project skip-changelog 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: frontend Related to the Nuxt frontend
Projects
Status: Accepted
Archived in project
Development

Successfully merging this pull request may close these issues.

Implementation Plan: Additional search views for the frontend
4 participants