Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(intl): ignore script subtags when canonicalizing locale strings #948

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

kszczek
Copy link
Contributor

@kszczek kszczek commented Feb 20, 2025

Locale strings with a script such as zh-Hans-CN are incorrectly canonicalized as zh instead of zh-CN. Update the canonicalizedLocale function to look for a script subtag and ignore it if found.

Closes: #945


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:

Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.

Copy link
Member

@mosuem mosuem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a hacky solution, but the existing code is also quite hacky, so that's fine :)

Copy link

github-actions bot commented Feb 21, 2025

PR Health

Breaking changes ✔️
Package Change Current Version New Version Needed Version Looking good?
intl None 0.20.2 0.20.3-wip 0.20.2 ✔️
Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

Coverage ✔️
File Coverage
pkgs/intl/lib/intl.dart 💚 98 %
pkgs/intl/lib/src/intl_helpers.dart 💚 54 % ⬆️ 12 %

This check for test coverage is informational (issues shown here will not fail the PR).

API leaks ⚠️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
intl StringStack
DateBuilder

This check can be disabled by tagging the PR with skip-leaking-check.

License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

@kszczek kszczek force-pushed the ignore-script-subtags branch from 45d5ff8 to 6ea1b12 Compare February 21, 2025 11:10
@mosuem
Copy link
Member

mosuem commented Feb 24, 2025

This fails internal Google testing, which looks like this:

test('Recover locale from string', () {
      expect(
        Locale.tryParse(Intl.canonicalizedLocale('zh_Hant_TW')),
        Locale.fromSubtags(
          languageCode: 'zh',
          scriptCode: 'Hant',
          countryCode: 'TW',
        ),
      );
    });

Which made me think about what canonicalizedLocale promises vs does. The documentation states it turns a locale string into the form xx_YY, but also ... where it might possibly be in the wrong case or with a hyphen instead of an underscore..
So it seems it was interpreted as trying to do canonicalization, more precisely level 1 canonicalization.

It doesn't help that NumberFormat.currency can't handle scripts... I think we should remedy that instead. This would mean that at

canonicalizedLocale,
, we insert another fallback after canonicalizedLocale but before shortLocale (which should be renamed to something like languageOnly), which returns what you implemented. This way, we would first try zh_Hant_CN, then zh_CN, then zh, etc. when running NumberFormat.currency.

What do you think?

Copy link
Member

@mosuem mosuem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As detailed in comment

@mosuem
Copy link
Member

mosuem commented Feb 24, 2025

I would also add something like

  test('Use script', () {
    expect(
      NumberFormat.currency(locale: 'zh_Hant_TW').format(12),
      'TWD12.00',
    );
    expect(
      NumberFormat.currency(locale: 'zh_Hant_CN').format(12),
      'CNY12.00',
    );
  });

to the tests.

Locale strings with a script such as `zh-Hans-CN` are incorrectly
canonicalized as `zh` instead of `zh-CN`. Update the
`verifiedLocale` function to look for a script subtag and ignore it
if found.
@kszczek kszczek force-pushed the ignore-script-subtags branch from 6ea1b12 to 88ab7a8 Compare March 7, 2025 10:21
@kszczek
Copy link
Contributor Author

kszczek commented Mar 7, 2025

Thank you for your patience, I couldn't get back to this sooner.

we insert another fallback after canonicalizedLocale but before shortLocale (which should be renamed to something like languageOnly), which returns what you implemented. This way, we would first try zh_Hant_CN, then zh_CN, then zh, etc. when running NumberFormat.currency.

Sounds good to me, even though there is no script specific data for numbers nor dates as far as I know, we should still try looking them up in case they appear in the future.

Implemented your suggestions and modified tests, let me know if I should change anything else.

@copybara-service copybara-service bot merged commit 6d680ad into dart-lang:main Mar 11, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strip script information when parsing a locale string
2 participants