Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor HREF normalization and models #387

Merged
merged 29 commits into from
Sep 21, 2023
Merged

Conversation

mickael-menu
Copy link
Member

@mickael-menu mickael-menu commented Aug 31, 2023

Changes in the HREF normalization strategy

The goal is to improve the interoperability of stored HREF with other platforms.

  • The Manifest parser doesn't normalize HREFs to the base URL anymore.
  • Container HREFs (e.g. ZIP entries) don't start with a / anymore. They are relative to the root of the archive.
  • Backward-compatibility for persisted locators starting with a /.

Refactoring of Href and Url

Various refactoring to improve the safety and reliability of URLs and Link HREFs.

Rationale

  • The legacy handling of HREFs as percent-decoded path was source of many issues (the most recent) with a fragile normalization strategy.
  • The use of strings when a URL is expected didn't take advantage of static checks to verify that we pass valid URLs, or check that we require an absolute URLs.
  • Link templated HREFs were too easy to ignore.

Changes

  • A new Url sealed class with two implementations AbsoluteUrl and RelativeUrl.
    • AbsoluteUrl offers helpers to check the URL scheme and convert to/from a File.
    • There are helpers to resolve a relative URL to a base URL, or to make an absolute URL relative. These replace the old Href() normalization strategy.
    • To be valid, a Url is always percent-encoded. We can encode a relative path (e.g. a ZIP entry name) with Url.fromDecodedPath().
  • The old util.Href doesn't exist anymore. There's a publication.Href which can be either a URI template, or a valid Url. It is used in Link objects.
  • Link and Locator expose Url and Href instead of raw strings.
  • To prevent breaking implementations, the OPDS 2 parser is still normalizing all the HREFs to self or the feed URL. I added some kind of Visitor to modify the Manifest to normalize the HREFs (ManifestTransformer).

@mickael-menu mickael-menu requested a review from qnga August 31, 2023 08:48
Copy link
Contributor

@qnga qnga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility, I would introduce a normalization from previous absolute URL-based locators to the new ones, based on hrefs directly provided by manifests.

@mickael-menu
Copy link
Member Author

For backward compatibility, I would introduce a normalization from previous absolute URL-based locators to the new ones, based on hrefs directly provided by manifests.

Yes, this is done in #388

/**
* Historically, we used to have "absolute" HREFs in the manifest:
* - starting with a `/` for packaged publications.
* - resolved to the `self` link for remote publications.
*
* We removed the normalization and now use relative HREFs everywhere, but we still need to support
* the locators created with the old absolute HREFs.
*/
@DelicateReadiumApi
public fun Publication.normalizeLocator(locator: Locator): Locator {
val self = (linkWithRel("self")?.href as? UrlHref)?.url as? AbsoluteUrl
return if (self == null) { // Packaged publication
locator.copy(
href = Url(locator.href.toString().removePrefix("/"))
?: return locator
)
} else { // Remote publication
// Check that the locator HREF relative to `self` exists int he manifest.
val relativeHref = self.relativize(locator.href)
if (linkWithHref(relativeHref) != null) {
locator.copy(href = relativeHref)
} else {
locator
}
}
}

@CodiumAI-Agent
Copy link

PR Analysis

  • 🎯 Main theme: This PR focuses on updating the HREF normalization strategy in the code. The changes involve removing the leading '/' from HREFs and making them relative to the root of the archive.
  • 📝 PR summary: The PR modifies the HREF normalization strategy in the code. It changes the HREFs to be relative to the root of the archive instead of starting with a '/'. The changes are reflected across multiple test files and the main code. The PR also ensures backward compatibility for persisted locators starting with a '/'.
  • 📌 Type of PR: Refactoring
  • 🧪 Relevant tests added: Yes
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is well-structured and the changes are consistent across all the files. The removal of the leading '/' from HREFs and making them relative to the root of the archive is a good approach. It's also commendable that the PR takes care of backward compatibility for persisted locators starting with a '/'. However, it would be beneficial to ensure that all edge cases are handled and tested.

  • 🤖 Code feedback:

    • relevant file: readium/shared/src/test/java/org/readium/r2/shared/publication/ManifestTest.kt
      suggestion: It seems like the tests have been updated to reflect the changes in the HREF normalization strategy. However, it would be beneficial to add more tests to cover edge cases, if not already done. For instance, testing how the code behaves when the HREF is empty or contains special characters. [medium]
      relevant line: links = listOf(Link(href = "manifest.json", rels = setOf("self"))),

    • relevant file: readium/shared/src/androidTest/java/org/readium/r2/shared/util/HrefTest.kt
      suggestion: The tests in this file have been updated to reflect the changes in the HREF normalization strategy. However, it would be beneficial to add more tests to cover edge cases, if not already done. For instance, testing how the code behaves when the HREF is empty or contains special characters. [medium]
      relevant line: assertEquals("foo/bar.txt", Href("foo/bar.txt", "").string)

How to use

Tag me in a comment '@CodiumAI-Agent' and add one of the following commands:
/review [-i]: Request a review of your Pull Request. For an incremental review, which only considers changes since the last review, include the '-i' option.
/describe: Modify the PR title and description based on the contents of the PR.
/improve [--extended]: Suggest improvements to the code in the PR. Extended mode employs several calls, and provides a more thorough feedback.
/ask <QUESTION>: Pose a question about the PR.
/update_changelog: Update the changelog based on the PR's contents.

To edit any configuration parameter from configuration.toml, add --config_path=new_value
For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
To list the possible configuration parameters, use the /config command.

# Conflicts:
#	readium/lcp/src/main/java/org/readium/r2/lcp/license/container/ContainerLicenseContainer.kt
#	readium/lcp/src/main/java/org/readium/r2/lcp/license/container/FileZipLicenseContainer.kt
#	readium/lcp/src/main/java/org/readium/r2/lcp/license/container/LcplLicenseContainer.kt
#	readium/lcp/src/main/java/org/readium/r2/lcp/license/container/LcplResourceLicenseContainer.kt
#	readium/lcp/src/main/java/org/readium/r2/lcp/license/container/LicenseContainer.kt
#	readium/shared/src/main/java/org/readium/r2/shared/publication/services/content/iterators/HtmlResourceContentIterator.kt
#	test-app/src/main/java/org/readium/r2/testapp/bookshelf/BookRepository.kt
#	test-app/src/main/java/org/readium/r2/testapp/catalogs/CatalogViewModel.kt
@mickael-menu mickael-menu changed the title Update HREF normalization strategy Refactor HREF normalization and models Sep 21, 2023
@mickael-menu mickael-menu merged commit 482ab0c into v3 Sep 21, 2023
3 checks passed
@mickael-menu mickael-menu deleted the fix/remove-root-prefix branch September 21, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants