Skip to content

libicu v78 formatting changes #7047

@ShortDevelopment

Description

@ShortDevelopment

The following Intl test started failing in MacOS ci likely to the new libicu v78.1 (see #7038 (comment))

// the "literal" tested here is the first of two literals, the second of which is a space between "12" and "AM"
test({ hour: "numeric", weekday: "long" }, ["weekday", "literal", "hour", "dayPeriod"], ["Saturday", ", ", "12", "AM"]);

ci-linux.txt
ci-macos.txt

ChakraCore

CC generates a skeleton in javascript

const formatToParts = createPublicMethod("Intl.DateTimeFormat.prototype.formatToParts", function formatToParts(date) {
/**
* Given a user-provided options object, getPatternForOptions generates a LDML/ICU pattern and then
* sets the pattern and all of the relevant options implemented by the pattern on the provided dtf before returning.
*
* @param {Object} dtf the DateTimeFormat internal object
* @param {Object} options the options object originally given by the user
*/
const getPatternForOptions = (function () {
// symbols come from the Unicode LDML: http://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table
const symbolForOption = {
weekday: "E",
era: "G",
year: "y",
month: "M",
day: "d",
// for hour, we have some special handling
hour: "j", hour12: "h", hour24: "H",
minute: "m",
second: "s",
timeZoneName: "z",
};
// NOTE - keep this up to date with the map in PlatformAgnostic::Intl::GetDateTimePartKind and the UDateFormatField enum
const optionForSymbol = {
E: "weekday", c: "weekday", e: "weekday",
G: "era",
y: "year", u: "year", U: "year",
M: "month", L: "month",
d: "day",
h: "hour", H: "hour", K: "hour", k: "hour",
m: "minute",
s: "second",
z: "timeZoneName", Z: "timeZoneName", v: "timeZoneName", V: "timeZoneName", O: "timeZoneName", X: "timeZoneName", x: "timeZoneName",
};
// lengths here are how many times a symbol is repeated in a skeleton for a given option
// the Intl spec recommends that Intl "short" -> CLDR "abbreviated" and Intl "long" -> CLDR "wide"
const symbolLengthForOption = {
numeric: 1,
"2-digit": 2,
short: 3,
long: 4,
narrow: 5,
};
const optionForSymbolLength = {
1: "numeric",
2: "2-digit",
3: "short",
4: "long",
5: "narrow",
};
// for fixing up the hour pattern later
const patternForHourCycle = {
h12: "h",
h23: "H",
h11: "K",
h24: "k",
};
const hourCycleForPattern = {
h: "h12",
H: "h23",
K: "h11",
k: "h24",
};
// take the hour12 option by name so that we dont call the getter for options.hour12 twice
return function (dtf, options, hour12) {
const resolvedOptions = _.reduce(dateTimeComponents, function (resolved, component) {
const prop = component[0];
const value = GetOption(options, prop, "string", component[1], undefined);
if (value !== undefined) {
resolved[prop] = value;
}
return resolved;
}, _.create());
const hc = dtf.hourCycle;
// Build up a skeleton by repeating skeleton keys (like "G", "y", etc) for a count corresponding to the intl option value.
const skeleton = _.reduce(_.keys(resolvedOptions), function (skeleton, optionKey) {
let optionValue = resolvedOptions[optionKey];
if (optionKey === "hour") {
// hour12/hourCycle resolution in the spec has multiple issues:
// hourCycle and -hc can be out of sync: https://github.com/tc39/ecma402/issues/195
// hour12 has precedence over a more specific option in hourCycle/hc
// hour12 can force a locale that prefers h23 and h12 to use h11 or h24, according to the spec
// We temporarily work around these similarly to firefox and implement custom hourCycle/hour12 resolution.
// TODO(jahorto): follow up with Intl spec about these issues
if (hour12 === true || (hour12 === undefined && (hc === "h11" || hc === "h12"))) {
optionKey = "hour12";
} else if (hour12 === false || (hour12 === undefined && (hc === "h23" || hc === "h24"))) {
optionKey = "hour24";
}
}
return skeleton + _.repeat(symbolForOption[optionKey], symbolLengthForOption[optionValue]);
}, "");
let pattern = platform.getPatternForSkeleton(dtf.locale, skeleton);
// getPatternForSkeleton (udatpg_getBestPattern) can ignore, add, and modify fields compared to the markers we gave in the skeleton.
// Most importantly, udatpg_getBestPattern will determine the most-preferred hour field for a locale and time type (12 or 24).
// Scan the generated pattern to extract the resolved fields, and fix up the hour field if the user requested an explicit hour cycle
let inLiteral = false;
let i = 0;
while (i < pattern.length) {
let cur = pattern[i];
const isQuote = cur === "'";
if (inLiteral) {
if (isQuote) {
inLiteral = false;
}
++i;
continue;
} else if (isQuote) {
inLiteral = true;
++i;
continue;
} else if (cur === " ") {
++i;
continue;
}
// we are not in a format literal, so we are in a symbolic section of the pattern
// now, we can force the correct hour pattern and set the internal slots correctly
if (cur === "h" || cur === "H" || cur === "K" || cur === "k") {
if (hc && hour12 === undefined) {
// if we have found an hour-like symbol and the user wanted a specific hour cycle,
// replace it and all such proceding contiguous symbols with the symbol corresponding
// to the user-requested hour cycle, if they are different
const replacement = patternForHourCycle[hc];
if (replacement !== cur) {
if (pattern[i + 1] === cur) {
// 2-digit hour
pattern = _.substring(pattern, 0, i) + replacement + replacement + _.substring(pattern, i + 2);
} else {
// numeric hour
pattern = _.substring(pattern, 0, i) + replacement + _.substring(pattern, i + 1);
}
// we have modified pattern[i] so we need to update cur
cur = pattern[i];
}
} else {
// if we have found an hour-like symbol and the user didnt request an hour cycle,
// set the internal hourCycle property from the resolved pattern
dtf.hourCycle = hourCycleForPattern[cur];
}
}
let k = i + 1;
while (k < pattern.length && pattern[k] === cur) {
++k;
}
const resolvedKey = optionForSymbol[cur];
const resolvedValue = optionForSymbolLength[k - i];
dtf[resolvedKey] = resolvedValue;
i = k;
}
dtf.pattern = pattern;
};
})();

and uses udatpg_getBestPattern to let libicu generate an appropriate pattern.

Var IntlEngineInterfaceExtensionObject::EntryIntl_GetPatternForSkeleton(RecyclableObject *function, CallInfo callInfo, ...)
return udatpg_getBestPatternWithOptions(
dtpg,
reinterpret_cast<const UChar *>(skeleton->GetSz()),
skeleton->GetLength(),
UDATPG_MATCH_ALL_FIELDS_LENGTH,
buf,
bufLen,
status
);

libicu

The table below shows the behavior changes of udatpg_getBestPattern (de-DE is just for additional context)
Notice the missing , in en-US v78.1.

version locale skeleton pattern
v77.1 en-US EEEEha cccc, h/a
ccccha cccc, h/a
de-DE EEEEha cccc, h 'Uhr' a
ccccha cccc, h 'Uhr' a
v78.1 en-US EEEEha EEEE h/a
ccccha EEEE h/a
de-DE EEEEha EEEE, h/a
ccccha EEEE, h/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions