Should compiled data constructors be on the owned or borrowed type? #5440

sffc · 2024-08-22T22:33:38Z

On types that have an owned and a borrowed variant, should the compiled data constructors build the owned or the borrowed variant? For example, should it be:

// Option 1: all constructors on owned type
Foo::new() -> FooBorrowed<'static>
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

or

// Option 2: constructor on borrowed type; Default impl available
FooBorrowed::new() -> FooBorrowed<'static>
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

or

// Option 3: constructors on both types
FooBorrowed::new() -> FooBorrowed<'static>
Foo::new() -> Foo
Foo::default() -> Foo
Foo::try_new_unstable(...) -> Result<Foo>

I think Foo::default() -> Foo should always be available because that's the way to get an actual Foo from compiled data, and clients might need an actual Foo if the object could be created multiple different ways.

I don't have a super strong opinion between options 1, 2, and 3.

@Manishearth @zbraniecki @hsivonen @robertbastian @nekevss

The text was updated successfully, but these errors were encountered:

Manishearth · 2024-08-22T22:36:44Z

I have previously been unhappy that the properties constructors only let you get static borrowed types, and would prefer that we have cheap ways to get owned types compatible with ones loaded from data structs. It's kinda annoying to have this split in types.

But it also seems to work well. It's unclear where we should be drawing the line.

sffc · 2024-08-22T23:05:21Z

For the properties types, my Default solution won't work, because there is not a single constructor for those types. So we should probably come up with a general solution that doesn't depend on Default.

Here's another option

FooBorrowed::new() -> FooBorrowed<'static>
FooBorrowed::static_to_owned(self) -> Foo

Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed(&self) -> FooBorrowed

sffc · 2024-08-27T01:34:55Z

Here's another question. In 2.0, should the types be named Foo and FooBorrowed, or should they be named FooOwned (or FooData) and Foo?

Manishearth · 2024-08-28T20:48:45Z

Hmm, if the borrowed types are the main ones it does make sense to have them be the primary ones.

But the borrowed types are the main ones only for singleton types. Are we okay with there being Foo<'static>/FooOwned (singleton) and Foo (and potentially FooBorrowed<'a>) for non-singletons? I think that's fine, we would rarely be doing FooBorrowed anyway for non-singletons.

sffc · 2024-08-29T03:32:55Z

Collator is the main example of a type that would be locale-dependent but might benefit from a Borrowed version.

We can make databake capable of returning the right thing even with a fallible try_new on the borrowed type. It's more work, though, and I don't really want to block 2.0 on that.

It's also more work, but since it seems like we probably want to generally move in this direction, we can just consistently apply this pattern everywhere:

pub struct FooFormatterData {
    payload: DataPayload<DataStructV1Marker>
}

pub struct FooFormatter<'a> {
    payload: &'a DataStructV1<'a>
}

impl FooFormatterData {
    pub fn get(&self) -> FooFormatter { ... }
}

impl<'a> FooFormatter<'a> {
    pub fn clone_with_data(&self) -> FooFormatterData { ... }
}

impl FooFormatter<'static> {
    pub fn static_to_owned(&self) -> FooFormatterData { ... }
}

Maybe for 2.0 we stick with the current naming, and in 3.0 we can consider switching things around, only because this wasn't on the table initially for 2.0 and we can't keep adding things to that milestone.

I'm pretty sure that, names aside, we can do this as an incremental migration. We can create FixedDecimalFormatterBorrowed and with whatever functions it needs, and the owned type continues to have direct formatting functions. Then in 3.0 we can remove the format functions from the owned type and do the project-wide renaming.

In hindsight this design seems pretty obvious and I don't know why we didn't consider it sooner.

sffc · 2024-09-03T15:33:56Z

@robertbastian prefers keeping the four constructors together in the docs. @robertbastian therefore prefers:

// Always:
Foo::new() -> FooBorrowed<'static>
Foo::try_new_unstable(...) -> Result<Foo>
Foo::as_borrowed() -> FooBorrowed

@sffc points out:

It might be necessary for FooBorrowed to have its own constructor if the type Foo is feature-gated
I think it is slightly strange/unconventional for a constructor on T to return something that is not T

@robertbastian says we could add FooBorrowed::new() if Foo is feature-gated.

Manishearth · 2024-09-03T15:40:18Z

You never want to add a function that only shows up when a feature is disabled, then global feature resolution will break code relying on the disabled crate.

sffc · 2024-09-03T17:45:53Z

One thing @robertbastian said which I agree with is that Foo::new() -> Foo is a bad function because it unnecessarily wraps the static ref in the enum (DataPayload) which leads to less efficient code.

I think we need FooBorrowed::new() since Foo could be feature-gated in some cases.

As noted previously, I think we should stick with the Foo/FooBorrowed names in 2.0, and possibly revisit them in 3.0.

So I think the choices are:

// Option 1 (2024-09-03): Foo::new() and FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Always have this function available:
FooBorrowed::new() -> FooBorrowed

// Option 2 (2024-09-03): Foo::new() and optional FooBorrowed::new()
Foo::new() -> FooBorrowed
Foo::try_new_unstable() -> Foo
// Add this function only when Foo actually gets feature-gated:
FooBorrowed::new() -> FooBorrowed

// Option 3 (2024-09-03): FooBorrowed::new() only
Foo::try_new_unstable() -> Foo
FooBorrowed::new() -> FooBorrowed

Either way, I think we should always have FooBorrowed::static_to_owned().

I can see pros and cons of all these choices so I won't weigh in yet. I'll see what others think.

Manishearth · 2024-09-03T22:51:57Z

Not opposed to 1 or 3. Opposed to 2, we shouldn't have any APIs that disappear when you enable a feature gate, that gets very brittle in larger deptrees. I guess we could do some fancy doc(hidden) stuff but most people are unlikely to browse ICU4X docs without --all-features..

No strong opinion between 1 and 3, provided 3 has sufficient documentation pointing things to each other.

sffc · 2024-09-03T23:54:34Z

I think the intent with 2 is that we create the function in code at the same time as we create the feature that gates the owned type, and otherwise we don't have the function available.

Manishearth · 2024-09-04T07:07:52Z

Ah! Okay, that's acceptable to me. No strong opinion.

hsivonen · 2024-09-04T10:28:18Z

My preference would be to give the primary name to the borrowed version, because it's the version that baked-data code will use exclusively e.g. with the normalizer, and the version that will have more usage lines of code even in apps that don't use baked data.

I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.

I'm currently working on making the collator have a borrowed variant, and the collator is a bit of a special case, because it both has obviously statically-known data (the normalizer data and the root collation) as well as data that gets decided at run time (even if choosing from various baked options).

robertbastian · 2024-09-04T10:39:09Z

I prefer the constructors for the borrowed types to be on the borrowed types themselves, because e.g. with the normalizer, putting the constructors on the owned types would create a strange reason to have to refer to the owned type in baked-data code when there would be no need to even refer to the owned type in baked-data code if the constructor was on the borrowed type.

This is why we have discussed having the constructor on both. Having it only on the borrowed type means the rustdoc for the owned type (which I consider the main entry type right now) would only show 3/4 constructors. If it is needed on the borrowed type, e.g. because the owned type is cfg-ed, then we can have it on both.

Manishearth · 2024-09-04T19:44:40Z

(which I consider the main entry type right now)

A discussion I had with @sffc the other day was whether we should be changing this. For singletons I definitely feel like we should be doing FooOwned and Foo, with the borrowed version being the main entry type.

For non-singletons the discussion gets more interesting: do we think most users are going to be using compiled data? if so, the borrowed version still could be the primary entry point with the nicer name, and FooOwned becomes "nice helper type if you need Yoke behavior"

robertbastian · 2024-09-05T08:54:15Z

What would be cool is if there was only a single type (borrowed), the compiled constructor would return Foo<'static>, and the owned constructors would return something like Owned<Foo<'static>>, which would deref to Foo<'static>.

Manishearth · 2024-09-05T16:14:23Z

That's kind of what Yoke can do.

It wouldn't be able to deref to Foo<'static> though.

sffc · 2024-09-05T23:32:40Z

I don't think there's generally a way to make Owned<Foo<'static>> because

There could be multiple DataPayload fields and some fields that are not DataPayload
Might require map_project because the borrowed type might smaller than the data struct

My position is that I think we should move toward having FooData (or FooOwned) and Foo<'a> everywhere, even on non-singleton types, but not in 2.0 because that is a really big change.

A middle ground would be to use this naming scheme only for types that currently have owned and borrowed variants, but I think this middle ground is bad because:

Creates two highly arbitrary classes of types, when really all ICU4X types have generally the same behavior, just some have the Borrowed optimization applied to them
We might migrate more things to be owned/borrowed in 2.x, and now we have a patchwork of things with different names and different meanings

To resolve this thread, I am happy with "Option 2 (2024-09-03)"

sffc · 2024-09-05T23:34:27Z

Observation: if we have Foo::new() -> Foo and then we migrate Foo to have a borrowed variant, we won't be able to change the function signature in 2.x. We'd need to deprecate that function and create Foo::new_borrowed() or something.

hsivonen · 2024-09-06T11:08:05Z

I think it's unfortunate if we don't end up switching from FooBorrowed and Foo to Foo and FooOwned (or FooData) before 2.0, but also I'm not volunteering to make the change.

sffc · 2024-09-17T09:16:53Z

I don't see a clear consensus on this thread and it seems we are not on the same page so I tagged it with discuss-priority

sffc · 2024-09-17T18:20:55Z

@zbraniecki should weigh in here, too.

sffc · 2024-09-26T16:27:58Z

Some discussion:

@zbraniecki As I understand it, Option 3 is just sacrificing a bit of discoverability. I think discoverability is solved with documentation. Having less interplotted constructors makes it easier to handle the evolution of the types separately.
@sffc The main technical advantage of Option 3 is that it avoids conflicts and inconsistency if we split a type to have a borrowed variant in the future.
@Manishearth So, basically, which matters more: discoverability or cleanliness? I'm not convinced discoverability is the biggest question, but I'm not convinced cleanliness is a big deal either.
@nekevss I don't see how discoverability is an issue in rustdoc.
@zbraniecki Since there is some hunger to revisit the design/naming, a simpler solution is probably better; we shouldn't try to make it perfect without deciding on the naming overall. So it seems doing the cleanest thing in 2.0 is better.

robertbastian · 2024-10-23T22:54:02Z

Discussion with @sffc @zbraniecki @robertbastian

In the short term:

Foo and FooBorrowed<'a>
- Foo::new returns FooBorrowed<'static>
- FooBorrowed::new also returns FooBorrowed<'static>
- FooBorrowed<'a>s can be added incrementally after 2.0; if so, the corresponding Foo::new would become #[deprecated] because returning FooBorrowed<'a> would be breaking

LGTM: (@sffc) @robertbastian (@zbraniecki)

Could consider for later:

FooOwned and Foo<'a>
- All constructors are on Foo<'a>
- We will adopt this naming scheme if there is a justified client need to adopt this for all types. There is not currently such a need, so we will not do it in 2.0.
  - This cannot be done incrementally, because changing Foo to Foo<'a> is breaking

LGTM: @sffc @robertbastian

Manishearth · 2024-10-23T23:11:04Z

LGTM on both. Would really like to make compiled static data be the happy path

sffc added C-meta Component: Relating to ICU4X as a whole 2.0-breaking Changes that are breaking API changes labels Aug 22, 2024

sffc added this to the ICU4X 2.0 ⟨P1⟩ milestone Aug 22, 2024

sffc added the needs-approval One or more stakeholders need to approve proposal label Aug 22, 2024

This was referenced Aug 22, 2024

Introduce borrowed variants of normalizer structs #5413

Merged

Change TimeZoneIdMapper::new() to return the borrowed variant #5441

Draft

sffc mentioned this issue Sep 11, 2024

Add WindowsTimeZoneMapper and WindowsZonesToBcp47Map provider #5524

Merged

sffc added the discuss-priority Discuss at the next ICU4X meeting label Sep 17, 2024

Manishearth mentioned this issue Sep 21, 2024

Remove Default impls that load data #5554

Open

sffc removed discuss-priority Discuss at the next ICU4X meeting needs-approval One or more stakeholders need to approve proposal labels Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should compiled data constructors be on the owned or borrowed type? #5440

Should compiled data constructors be on the owned or borrowed type? #5440

sffc commented Aug 22, 2024

Manishearth commented Aug 22, 2024 •

edited

Loading

sffc commented Aug 22, 2024 •

edited

Loading

sffc commented Aug 27, 2024

Manishearth commented Aug 28, 2024

sffc commented Aug 29, 2024 •

edited

Loading

sffc commented Sep 3, 2024

Manishearth commented Sep 3, 2024

sffc commented Sep 3, 2024

Manishearth commented Sep 3, 2024

sffc commented Sep 3, 2024

Manishearth commented Sep 4, 2024

hsivonen commented Sep 4, 2024

robertbastian commented Sep 4, 2024

Manishearth commented Sep 4, 2024

robertbastian commented Sep 5, 2024

Manishearth commented Sep 5, 2024

sffc commented Sep 5, 2024

sffc commented Sep 5, 2024

hsivonen commented Sep 6, 2024

sffc commented Sep 17, 2024

sffc commented Sep 17, 2024

sffc commented Sep 26, 2024

robertbastian commented Oct 23, 2024

Manishearth commented Oct 23, 2024

Should compiled data constructors be on the owned or borrowed type? #5440

Should compiled data constructors be on the owned or borrowed type? #5440

Comments

sffc commented Aug 22, 2024

Manishearth commented Aug 22, 2024 • edited Loading

sffc commented Aug 22, 2024 • edited Loading

sffc commented Aug 27, 2024

Manishearth commented Aug 28, 2024

sffc commented Aug 29, 2024 • edited Loading

sffc commented Sep 3, 2024

Manishearth commented Sep 3, 2024

sffc commented Sep 3, 2024

Manishearth commented Sep 3, 2024

sffc commented Sep 3, 2024

Manishearth commented Sep 4, 2024

hsivonen commented Sep 4, 2024

robertbastian commented Sep 4, 2024

Manishearth commented Sep 4, 2024

robertbastian commented Sep 5, 2024

Manishearth commented Sep 5, 2024

sffc commented Sep 5, 2024

sffc commented Sep 5, 2024

hsivonen commented Sep 6, 2024

sffc commented Sep 17, 2024

sffc commented Sep 17, 2024

sffc commented Sep 26, 2024

robertbastian commented Oct 23, 2024

Manishearth commented Oct 23, 2024

Manishearth commented Aug 22, 2024 •

edited

Loading

sffc commented Aug 22, 2024 •

edited

Loading

sffc commented Aug 29, 2024 •

edited

Loading