Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should compiled data constructors be on the owned or borrowed type? #5440

Open
sffc opened this issue Aug 22, 2024 · 24 comments
Open

Should compiled data constructors be on the owned or borrowed type? #5440

sffc opened this issue Aug 22, 2024 · 24 comments
Labels
2.0-breaking Changes that are breaking API changes C-meta Component: Relating to ICU4X as a whole

Comments

@sffc
Copy link
Member

sffc commented Aug 22, 2024

This came up in #5413 (comment).

On types that have an owned and a borrowed variant, should the compiled data constructors build the owned or the borrowed variant? For example, should it be:

// Option 1: all constructors on owned type
Foo::new() -> FooBorrowed<'static>
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

or

// Option 2: constructor on borrowed type; Default impl available
FooBorrowed::new() -> FooBorrowed<'static>
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

or

// Option 3: constructors on both types
FooBorrowed::new() -> FooBorrowed<'static>
Foo::new() -> Foo
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

I think Foo::default() -> Foo should always be available because that's the way to get an actual Foo from compiled data, and clients might need an actual Foo if the object could be created multiple different ways.

I don't have a super strong opinion between options 1, 2, and 3.

@Manishearth @zbraniecki @hsivonen @robertbastian @nekevss

@sffc sffc added C-meta Component: Relating to ICU4X as a whole 2.0-breaking Changes that are breaking API changes labels Aug 22, 2024
@sffc sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Aug 22, 2024
@sffc sffc added the needs-approval One or more stakeholders need to approve proposal label Aug 22, 2024
@Manishearth
Copy link
Member

Manishearth commented Aug 22, 2024

I have previously been unhappy that the properties constructors only let you get static borrowed types, and would prefer that we have cheap ways to get owned types compatible with ones loaded from data structs. It's kinda annoying to have this split in types.

But it also seems to work well. It's unclear where we should be drawing the line.

@sffc
Copy link
Member Author

sffc commented Aug 22, 2024

For the properties types, my Default solution won't work, because there is not a single constructor for those types. So we should probably come up with a general solution that doesn't depend on Default.

Here's another option

FooBorrowed::new() -> FooBorrowed<'static>
FooBorrowed::static_to_owned(self) -> Foo

Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed(&self) -> FooBorrowed

@sffc
Copy link
Member Author

sffc commented Aug 27, 2024

Here's another question. In 2.0, should the types be named Foo and FooBorrowed, or should they be named FooOwned (or FooData) and Foo?

@Manishearth
Copy link
Member

Hmm, if the borrowed types are the main ones it does make sense to have them be the primary ones.

But the borrowed types are the main ones only for singleton types. Are we okay with there being Foo<'static>/FooOwned (singleton) and Foo (and potentially FooBorrowed<'a>) for non-singletons? I think that's fine, we would rarely be doing FooBorrowed anyway for non-singletons.

@sffc
Copy link
Member Author

sffc commented Aug 29, 2024

Collator is the main example of a type that would be locale-dependent but might benefit from a Borrowed version.

We can make databake capable of returning the right thing even with a fallible try_new on the borrowed type. It's more work, though, and I don't really want to block 2.0 on that.

It's also more work, but since it seems like we probably want to generally move in this direction, we can just consistently apply this pattern everywhere:

pub struct FooFormatterData {
    payload: DataPayload<DataStructV1Marker>
}

pub struct FooFormatter<'a> {
    payload: &'a DataStructV1<'a>
}

impl FooFormatterData {
    pub fn get(&self) -> FooFormatter { ... }
}

impl<'a> FooFormatter<'a> {
    pub fn clone_with_data(&self) -> FooFormatterData { ... }
}

impl FooFormatter<'static> {
    pub fn static_to_owned(&self) -> FooFormatterData { ... }
}

Maybe for 2.0 we stick with the current naming, and in 3.0 we can consider switching things around, only because this wasn't on the table initially for 2.0 and we can't keep adding things to that milestone.

I'm pretty sure that, names aside, we can do this as an incremental migration. We can create FixedDecimalFormatterBorrowed and with whatever functions it needs, and the owned type continues to have direct formatting functions. Then in 3.0 we can remove the format functions from the owned type and do the project-wide renaming.

In hindsight this design seems pretty obvious and I don't know why we didn't consider it sooner.

@sffc
Copy link
Member Author

sffc commented Sep 3, 2024

@robertbastian prefers keeping the four constructors together in the docs. @robertbastian therefore prefers:

// Always:
Foo::new() -> FooBorrowed<'static>
Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed() -> FooBorrowed

@sffc points out:

  1. It might be necessary for FooBorrowed to have its own constructor if the type Foo is feature-gated
  2. I think it is slightly strange/unconventional for a constructor on T to return something that is not T

@robertbastian says we could add FooBorrowed::new() if Foo is feature-gated.

@Manishearth
Copy link
Member

You never want to add a function that only shows up when a feature is disabled, then global feature resolution will break code relying on the disabled crate.

@sffc
Copy link
Member Author

sffc commented Sep 3, 2024

One thing @robertbastian said which I agree with is that Foo::new() -> Foo is a bad function because it unnecessarily wraps the static ref in the enum (DataPayload) which leads to less efficient code.

I think we need FooBorrowed::new() since Foo could be feature-gated in some cases.

As noted previously, I think we should stick with the Foo/FooBorrowed names in 2.0, and possibly revisit them in 3.0.

So I think the choices are:

// Option 1 (2024-09-03): Foo::new() and FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Always have this function available:
FooBorrowed::new() -> FooBorrowed

// Option 2 (2024-09-03): Foo::new() and optional FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Add this function only when Foo actually gets feature-gated:
FooBorrowed::new() -> FooBorrowed

// Option 3 (2024-09-03): FooBorrowed::new() only
Foo::try_new_unstable() -> Foo
FooBorrowed::new() -> FooBorrowed

Either way, I think we should always have FooBorrowed::static_to_owned().

I can see pros and cons of all these choices so I won't weigh in yet. I'll see what others think.

@Manishearth
Copy link
Member

Not opposed to 1 or 3. Opposed to 2, we shouldn't have any APIs that disappear when you enable a feature gate, that gets very brittle in larger deptrees. I guess we could do some fancy doc(hidden) stuff but most people are unlikely to browse ICU4X docs without --all-features..

No strong opinion between 1 and 3, provided 3 has sufficient documentation pointing things to each other.

@sffc
Copy link
Member Author

sffc commented Sep 3, 2024

I think the intent with 2 is that we create the function in code at the same time as we create the feature that gates the owned type, and otherwise we don't have the function available.

@Manishearth
Copy link
Member

Ah! Okay, that's acceptable to me. No strong opinion.

@hsivonen
Copy link
Member

hsivonen commented Sep 4, 2024

My preference would be to give the primary name to the borrowed version, because it's the version that baked-data code will use exclusively e.g. with the normalizer, and the version that will have more usage lines of code even in apps that don't use baked data.

I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.

I'm currently working on making the collator have a borrowed variant, and the collator is a bit of a special case, because it both has obviously statically-known data (the normalizer data and the root collation) as well as data that gets decided at run time (even if choosing from various baked options).

@robertbastian
Copy link
Member

I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.

This is why we have discussed having the constructor on both. Having it only on the borrowed type means the rustdoc for the owned type (which I consider the main entry type right now) would only show 3/4 constructors. If it is needed on the borrowed type, e.g. because the owned type is cfg-ed, then we can have it on both.

@Manishearth
Copy link
Member

(which I consider the main entry type right now)

A discussion I had with @sffc the other day was whether we should be changing this. For singletons I definitely feel like we should be doing FooOwned and Foo, with the borrowed version being the main entry type.

For non-singletons the discussion gets more interesting: do we think most users are going to be using compiled data? if so, the borrowed version still could be the primary entry point with the nicer name, and FooOwned becomes "nice helper type if you need Yoke behavior"

@robertbastian
Copy link
Member

What would be cool is if there was only a single type (borrowed), the compiled constructor would return Foo<'static>, and the owned constructors would return something like Owned<Foo<'static>>, which would deref to Foo<'static>.

@Manishearth
Copy link
Member

That's kind of what Yoke can do.

It wouldn't be able to deref to Foo<'static> though.

@sffc
Copy link
Member Author

sffc commented Sep 5, 2024

I don't think there's generally a way to make Owned<Foo<'static>> because

  1. There could be multiple DataPayload fields and some fields that are not DataPayload
  2. Might require map_project because the borrowed type might smaller than the data struct

My position is that I think we should move toward having FooData (or FooOwned) and Foo<'a> everywhere, even on non-singleton types, but not in 2.0 because that is a really big change.

A middle ground would be to use this naming scheme only for types that currently have owned and borrowed variants, but I think this middle ground is bad because:

  1. Creates two highly arbitrary classes of types, when really all ICU4X types have generally the same behavior, just some have the Borrowed optimization applied to them
  2. We might migrate more things to be owned/borrowed in 2.x, and now we have a patchwork of things with different names and different meanings

To resolve this thread, I am happy with "Option 2 (2024-09-03)"

@sffc
Copy link
Member Author

sffc commented Sep 5, 2024

Observation: if we have Foo::new() -> Foo and then we migrate Foo to have a borrowed variant, we won't be able to change the function signature in 2.x. We'd need to deprecate that function and create Foo::new_borrowed() or something.

@hsivonen
Copy link
Member

hsivonen commented Sep 6, 2024

I think it's unfortunate if we don't end up switching from FooBorrowed and Foo to Foo and FooOwned (or FooData) before 2.0, but also I'm not volunteering to make the change.

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

I don't see a clear consensus on this thread and it seems we are not on the same page so I tagged it with discuss-priority

@sffc
Copy link
Member Author

sffc commented Sep 17, 2024

@zbraniecki should weigh in here, too.

@sffc
Copy link
Member Author

sffc commented Sep 26, 2024

Some discussion:

  • @zbraniecki As I understand it, Option 3 is just sacrificing a bit of discoverability. I think discoverability is solved with documentation. Having less interplotted constructors makes it easier to handle the evolution of the types separately.
  • @sffc The main technical advantage of Option 3 is that it avoids conflicts and inconsistency if we split a type to have a borrowed variant in the future.
  • @Manishearth So, basically, which matters more: discoverability or cleanliness? I'm not convinced discoverability is the biggest question, but I'm not convinced cleanliness is a big deal either.
  • @nekevss I don't see how discoverability is an issue in rustdoc.
  • @zbraniecki Since there is some hunger to revisit the design/naming, a simpler solution is probably better; we shouldn't try to make it perfect without deciding on the naming overall. So it seems doing the cleanest thing in 2.0 is better.

@robertbastian
Copy link
Member

Discussion with @sffc @zbraniecki @robertbastian

In the short term:

  • Foo and FooBorrowed<'a>
    • Foo::new returns FooBorrowed<'static>
    • FooBorrowed::new also returns FooBorrowed<'static>
    • FooBorrowed<'a>s can be added incrementally after 2.0; if so, the corresponding Foo::new would become #[deprecated] because returning FooBorrowed<'a> would be breaking

LGTM: (@sffc) @robertbastian (@zbraniecki)

Could consider for later:

  • FooOwned and Foo<'a>
    • All constructors are on Foo<'a>
    • We will adopt this naming scheme if there is a justified client need to adopt this for all types. There is not currently such a need, so we will not do it in 2.0.
      • This cannot be done incrementally, because changing Foo to Foo<'a> is breaking

LGTM: @sffc @robertbastian

@Manishearth
Copy link
Member

LGTM on both. Would really like to make compiled static data be the happy path

@sffc sffc removed discuss-priority Discuss at the next ICU4X meeting needs-approval One or more stakeholders need to approve proposal labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes C-meta Component: Relating to ICU4X as a whole
Projects
Status: Unclaimed for sprint
Development

No branches or pull requests

4 participants