Replies: 7 comments 5 replies
-
From the core MF2 spec point of view, From the plural selector point of view, the question then becomes whether to ever match
Alternatively, we could make the
What if the source language is Polish, which uses
Agreed. This is why the definition of the function registry is important, as it needs to be able to provide the data needed by said translation software. |
Beta Was this translation helpful? Give feedback.
-
A second thread of related conversation from #320: Originally posted by @aphillips in #320 (comment): This is better normative language, but leaves open what happens if (for example) there is no catch-all. We should probably have a name for the catch-all, such as "default" I'm not sure I agree with the intent, since it does not allow the selector to do anything except select
There's an argument to be made that this selector is toast since you can't be sure that the selection will do the right thing. Perhaps what we should do (at least for now) is say something like:
Side note: we should start to be careful with our 2119 keyword usage, because these become conformance criteria. Originally posted by @eemeli in #320 (comment): It sounds like this is really about the dual role played by the
In most cases, that's probably overkill, but we do allow for it in the spec by falling back to I would presume that that message should raise a validation error during some build step, but during formatting/runtime it would not be considered an error for any numeric Originally posted by @aphillips in #320 (comment): My first reaction is that it would be a parse error, but actually, that would require MessageFormatter to understand what the selector expects/produces. I'm not wild about having both Originally posted by @mihnita in #320 (comment):
I don't think this is allowed. The keys are either strings that are matched "as is" (like So I would expect that the registration of Originally posted by @stasm in #320 (comment):
In some Slavic languages Originally posted by @aphillips in #320 (comment):
We could allow separate generic ( Originally posted by @macchiati in #320 (comment): Yes, the 'other' category serves two roles, and it is not the most neutral The main purpose for 'other' as fallback is when a new category is added. It would be possible to use it to reduce the data size, but that was a side With MF2.0 the selection logic is a bit different — with the selections all When you do have the case where all the selection values are already Then the last message would never be reached. |
Beta Was this translation helpful? Give feedback.
-
I think it should be possible for CAT tools to retrieve meta information about functions and recognize that Separately, there's a concern I have about the examples we use in this conversation.
The above example (taken from Eeemli comment above) uses three different sentence structures - one for zero, one for countable number, and one for uncountable. In other words, "You have a lot of wildbeest" is not a variant of "You have { $count } many wildbeest" - it is a separate, different, sentence. Same goes for "You have no wildbeest". The pattern of such use of selector is something we noticed engineers try to do with Fluent a lot, and we worked hard to decompose the plural selector from parametrized-meta-message (term I coined just now to label the flawed concept).
This is an extreme, but we encountered cases like this. What happens here is that the developer is attracted to cluster two separate messages under a single localization id contract and parametrize it based on a variable. The concept feels very very similar to the plural case so it's easy to understand why people would be confused. Especially that often for plural cases developers want to have a different message for when the value of a variable is zero. But in principle, from the architectural perspective, I want to make a strong case that those two scenarios are very different, and I would be excited if we managed to make this distinction better noticable by encouraging proper use and discouraging improper. To help myself evaluate which scenario I have in mind I use a litmus test - in a language without plurals, would I still want to have those two variants?
In a language without grammatical feature of plural rules I would still want to have a selector here, just for:
That's a warning sign! I am not really using the selector to correctly build grammatical forms of a single message, I am clustering two messages - one for Instead, I believe we should have two messages here:
and then in a language without plurals, I'd write
I think it is even more visible when we consider a different grammatical feature, one that doesn't exist in English. The reason developers are abusing selectors is because from their perspective it is convenient to create a single localization id contract for the UI element, bind it, and then control which message is displayed by the same variable as the message itself uses to select its variant. This brings me back to #118 although in a slightly different formulation. Instead of focusing on multi-message UI elements, we can discuss multi-variant UI elements (do not confuse with multi-variant messages) - which are elements that may have multiple messages depending on a variable. A common example is a button that, when abuses l10n selectors, has the following message:
We saw it in Firefox many times for the reasons I laid out above. My conclusion is that we should try to provide support for good developer experience for such scenarios and make it architecturally separate from selectors. |
Beta Was this translation helpful? Give feedback.
-
Now, with my previous separation in mind, I think it'll be easier to discuss We have two concepts here:
In previous conversations we concluded that in many cases those two use cases are effectively the same and the same variant is the correct one for both. I am not yet confident, but I start to suspect that the only cases where those two scenarios are not using the same variant are when the message is conflating the two concepts I described above. @aphillips also brought up this example:
I consider this to be an anti-pattern. The Variant From the perspective of end user Therefore I suggest that this should be:
|
Beta Was this translation helpful? Give feedback.
-
I rather like the none/some distinction that @zbraniecki makes above, esp. for cases like
where it's clear that this ought to be two different messages. I'm not convinced, however, that categorically splitting off a "none" message always makes sense. For instance, I would keep something like this together as one message:
Here, the shape and role of the formatted message can be considered independent of the One interesting question raised by this sort of none/some distinction or explicit Arabic plural categories includes Latvian includes Then, considering the slightly more generic case, could/should we allow for conditions like " And finally, that then brings back the question of allowing for some separation of specifically wanting to format the message in the
Here, |
Beta Was this translation helpful? Give feedback.
-
I firmly agree with Addision on the semantic and usage difference between 1 and one, etc. I agree with Zibi in that overloading plurals for completely different messages is a complete misuse. But "you have no books" and "you have 0 books" are not a misuse. They are simply alternative expressions with essentially the same semantics. In any event, the way plural categories work is really out of scope for this group. As to Addison's other note:
I agree that it would be better for users to not require [when *] in that case, because that is purely redundant message variant, because 'other' is defined by the selector to be a 'catch all'. If a selector doesn't have such a catch-all, then * should be required. For example, take a basic unbounded string selector for match {$someString} I think we could handle Addison's scenario with something like the following rules (terminology can of course be changed)
Condition #2 is satisfied by a condition with all asterisks, such as [when * ... *], or a condition where any of those * values are replaced by the catch-all value for its selector. Example: suppose we have two selectors match {$foo :number} {$fii :string} A sufficient catch-all selection condition would be either |
Beta Was this translation helpful? Give feedback.
-
👍
…On Wed, Jan 25, 2023, 12:14 Eemeli Aro ***@***.***> wrote:
Example: suppose we have two selectors
match {$foo :number} {$fii :string}
A sufficient catch-all selection condition would be either when other *
{...} *or* when * * {...}
The current syntax spec requires
<https://github.com/unicode-org/message-format-wg/blob/main/spec/syntax.md#variants>
that "At least one variant's keys must all be equal to the catch-all key (
*)."
Do we want to consider relaxing this to allow for when other * to act as
a similar fallback guarantor? As we've rather studiously avoided any
special-casing at this level for numbers/plurals, the form of the spec
language would probably be something like allowing for a function registry
entry to define a list of keys that exhaust the possible options, i.e. [
'other' ] for plurals or [ 'true', 'false' ] for booleans.
To give another example, I would think that allowing for the above with
plurals would also make it possible for something like this to be valid:
match {$open :boolean}
when true {The door is open.}
when false {The door is closed.}
I think I'd be ok with this?
—
Reply to this email directly, view it on GitHub
<#322 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEY5CW2MNZ5YKQ66L3WUGCRHANCNFSM6AAAAAATQYQFKA>
.
You are receiving this because you were mentioned.Message ID:
<unicode-org/message-format-wg/repo-discussions/322/comments/4781449@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
This thread of conversation between @aphillips, @macchiati and myself was started in #320, and it went a bit off-topic, so splitting it off to here.
Originally posted by @aphillips in #320 (comment):
@macchiati I agree with @eemeli here in that MF can only specify MF-level requirements. What you're talking about is internal to a specific (if important) selector, namely the plural one. At this level, I think we're handling it correctly by requiring the fallback be present (missing fallback error) and allowing for selector-internal errors. What you're describing is not an error.
PluralFormat
has always resolved keywords using plural rules and then fallen back toother
(*
in our case) when no matching key is found. I don't know what others teach, but I always teach developers to write=0
,one
, andother
in English (with special care thatone
doesn't mean=1
) and let the translators handle target languages.The potential instability of plural rules between versions is super annoying, if understandable. Such changes should be as absolutely rare as possible because of how localization automation works. Even if translation systems update their CLDR data (and indeed assuming they consume this data) with alacrity, existing resources and translation memories make for long term quality control issues for everyone involved, plus the need to retrain linguists, engineers, and others.
Note that the elimination of a rule doesn't hurt anything (except storage) but addition of a rule (as was done recently with French, with the addition of
many
) is a huge pain in the ass [I am not arguing that it shouldn't have been done].Originally posted by @macchiati in #320 (comment):
I can see the reasoning. We'll want to make sure to educate people that *
(for plurals, ordinals) is the same as 'other' in MF1.0, and that 'other'
must never occur.
As to additional categories in plural rules, I fully understand — we avoid
it unless the preponderance of evidence is compelling. It is a very painful
process to update, although if the new value is split off from 'other',
then it is far less painful.
Originally posted by @eemeli in #320 (comment):
This isn't actually quite so clear-cut. In MF1,
other
fulfills two different roles for plurals:In MF2, it's possible to mirror this structure by using
*
instead ofother
, but it's also possible to separate the two by using bothother
and*
. For example, I could imagine something like this being occasionally useful:Here, the
other
case is in no way special from the MF2 point of view, and it's only up to the:plural
function to consider it anything special, and to figure out matching for e.g. a non-numeric$count
. Those details are currently considered to be an implementation decision.Originally posted by @aphillips in #320 (comment):
@eemeli I can't find the discussion the group had previously about
other
vs.*
or I'd refer this conversation to that issue. I suspect we should take this to an issue for discussion rather than here (as it's off topic here).I happen to think that we should choose between
other
and*
and not have both, if for no other reason than that it adds a bunch of permutation cases to plurals (and other selectors) and that it impacts the ability of tooling to generate all of the necessary cases from plural rules.For error cases (and this might be on topic here??) perhaps we should add an optional
else
to the syntax (to allow the developer to override the default error handling):Originally posted by @macchiati in #320 (comment):
I don't find the use case you present very motivating. That is, in
production software we would never want to expose your 3rd message to
users.
match {$count :plural}
when one {You have {$count} new message}
when other {You have {$count} new messages}
when * {Message count not available}
The 'other' case is special in the plural rules, in that it encompasses all
numbers that don't match the other categories, and always occurs in every
locale. So the problem lies in that if a programmer ever writes the
following, m3 will never appear.
when one male {m1}
when other * {m2}
when * * {m3}
Moreover, our translation software has to know what the plural categories
are and how they work, because otherwise it can't flesh out the
appropriate cases for each locale, for them to be translated. So it will
know that Arabic, for example, will need 6 plural categories, not just the
2 for English. And some other languages will only have 'other'.
The same would be true for other selectors, like grammatical case, that
have to be expanded (or contracted) depending on the software. So while the
MF2.0 spec in theory doesn't need to treat these specially, any non-trivial
translation pipeline will need to.
Beta Was this translation helpful? Give feedback.
All reactions