-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update lang docs #51
Update lang docs #51
Conversation
|
||
Protomaps follows OpenStreetMaps's convention where a features's primary name value is is the most common name in the local language(s). | ||
|
||
In practice, this is most often a single name value like: | ||
|
||
- `London` the locality is represented as a simple key, value pair: `name` = `London` | ||
|
||
However, many places have more than one common local languages and Protomaps passes thru OpenStreetMap's convention of concatenating multiple names with a `/` deliminator into a single name value, like: | ||
However, many places have more than one common local languages and Protomaps passes thru OpenStreetMap's convention of concatenating multiple names with a `/` or `-` deliminator into a single name value, like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true/relevant anymore now that we require a language to be passed to the style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit less relevant than before. Only when the local language and the target language use different scripts you will see these. Example: Map localized to Greek and you are looking at Bozen - Bolzano...
basemaps/localization.md
Outdated
|
||
Protomaps structures localized names using the same `name:{language_code}` formatting as OpenStreetMap. | ||
If a name from OpenStreetMap contains text in more than one script, then Protomaps breaks up the name into segments. There can be up to 3 segments: `name`, `name2`, and `name3`. Each segment should have a unique script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe clarify this is the name
tag "de-facto primary local name" from OSM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the name2
and name3
synthetic properties used in the style?
Is there a downside to overwriting the value of name
with (implied) name1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the name splitting only happen when an allow listed delim (/
or -
) is observed in the upstream OSM name
tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe clarify this is the name tag "de-facto primary local name" from OSM
OK
How are the name2 and name3 synthetic properties used in the style?
Depends on the target language and the scripts in name2 and name3. For example, if the target language uses a different script from name2, name3, then those appear as second and third line. Other example, if the target language uses the same script as name3, then that appears as first line if the target language is not available.
Is there a downside to overwriting the value of name with (implied) name1?
Not that I know of.
Does the name splitting only happen when an allow listed delim (/ or -) is observed in the upstream OSM name tag?
No, it always happens when an OSM name contains more than one script. There are exceptions for example when you have 5 Arabic Unicode Codepoints, then the Latin letters "AB" and then again 5 Arabic Unicode Codepoints. That could be for example a Latin street letter in an Arabic street label. In that case the text is not segmented. Here is a link to some tests that cover special cases: https://github.com/protomaps/basemaps/blob/60e7d485c7fc6a4b28be525ebc03f6bdd4f20837/tiles/src/test/java/com/protomaps/basemap/names/ScriptSegmenterTest.java#L97-L127
basemaps/localization.md
Outdated
|
||
## Positioned glyph font `pmap:pgf:name:*` values | ||
|
||
Protomaps adds additional names for a small set of language scripts, currently just the `Devanagari` script used for Hindi (`name:hi` and `pmap:pgf:name:hi`) and related languages. | ||
|
||
Rendering text in web browsers works for almost all languages and scripts and feels like magic. However, specialized map renderers like MapLibre have to reimplement text rendering and text layout which is complicated when text needs to be curved along linear map features instead of placed only horizontally or vertically. MapLibre normally assumes a one-to-one mapping between glyphs and Unicode codepoints that also exist in MapLibre font files (aka "font stacks") to accomplish the layout for a large but limited number of scripts. Plugins have been developed to extend MapLibre for **right-to-left** scripts like Arabic and Hebrew, and MapLibre has built-in support for **CJK scripts** like Chinese, Japanese, and Korean. | ||
|
||
To facilitate Protomap's support of additional, non-supported scripts in MapLibre (like the Devanagari script used by the Hindi language), Protomaps exports names with "positioned glphys" so MapLibre can use codepoints as indices of positioned glyphs in an additional custom "font stack". While the raw `pmap:pgf:name:*` values look like giberish when inspecting the raw values, they render correctly in MapLibre to the end user. | ||
To facilitate Protomap's support of additional, non-supported scripts in MapLibre (like the Devanagari script used by the Hindi language), Protomaps exports names with "positioned glphys" so MapLibre can use codepoints as indices of positioned glyphs in an additional custom "font stack". While the raw `pmap:pgf:name:*` values look like gibberish when inspecting the raw values, they render correctly in MapLibre to the end user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glphys -> glyphs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry we're starting to mix up Protomaps the tile schema with Protomaps the map style a bit in these docs.
@@ -12,31 +12,81 @@ Protomaps has several localization options for names used in text labels. | |||
|
|||
<MaplibreMap/> | |||
|
|||
## Default `name` value | |||
## Local Names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdon what's the Title Case
versus Sentence case
convention elsewhere in docs that we should be following?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No rules right now! I feel Sentence case
is more natural, but let's just fix them when we see them.
|
||
## Localized `name:*` values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider restoring this section heading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved further down and replaced with "Translated Names"
basemaps/localization.md
Outdated
|
||
If `pmap:script*` is not present on a name, then it means that the name uses the `Latin` script. | ||
|
||
Sometimes names might contain text in multiple scripts. In that case `pmap:script` is set to `Mixed`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes a name might contain text in multiple scripts. In that case
pmap:script
is set toMixed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this happen only if the delim is not present? As a future todo, would we want to split the string until it's no longer Mixed? As it stands it's a little confusing that above we say we do split the various names, but then it's confusing it could possibly still be Mixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to
Sometimes segmentation into single scripts fails due for example inconsistent usage of alphabets. In that case pmap:script
is set to Mixed
.
basemaps/localization.md
Outdated
(pmap:script2 absent) | ||
``` | ||
|
||
The OSM name for "Zürich" only uses the Latin script and therefore we use in Protomaps only `name` and leave `pmap:script` away which implies that the script of the `name` is `Latin`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OSM name for "Zürich" only uses the Latin script so we export
name
and but omitpmap:script
(implying the script of thename
isLatin
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
basemaps/localization.md
Outdated
(pmap:script3 absent) | ||
``` | ||
|
||
The OSM name for Hong Kong is "Hong Kong 香港". We break this up into `name` and `name2` in Protomaps. Since the script of `name` is `Latin`, the `pmap:script` tag is omitted. The script of `name2` is `Han` which is encoded in `pmap:script2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there have been a delim here in "Hong Kong 香港" (either a /
or a -
)?
Always spell out OpenStreetMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there have been a delim here in "Hong Kong 香港" (either a / or a -)?
No, it does not have a delimiter. "香港 Hong Kong" is in https://www.openstreetmap.org/node/7414774650
Always spell out OpenStreetMap.
OK
- `name:zh-Hans` = `瑞士` | ||
- `name:zh-Hant` = `瑞士` | ||
- _... many other localized values..._ | ||
|
||
_NOTE: The Chinese (`zh`) examples above demonstrates how a single language can have multiple writing systems, in this case both simplified Chinese (`zh-Hans`) used in mainland China and tranditional Chinese (`zh-Hant`) used in Taiwan. The value stored in `zh` could be either of those._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore the zh
note, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always normalize zh
in OSM to the two explicate variants (as is indicated in #51 (comment)), then this can be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we only export name:zh-Hant
and name:zh-Hans
from OpenStreetMap to the tiles. If one or both of these are missing on the OSM feature, but name:zh
is available, then we backfill name:zh
into name:zh-Hans
or name:zh-Hant
.
If I remember correctly, you had a technique to say if a name:zh
string was written in name:zh-Hans
or name:zh-Hant
in tilezen. Is that correct? If yes, how did you do it?
|
||
To help solve this, Protomaps characterizes the scipt used in the default `name` value by adding a `pmap:script` tag. | ||
|
||
Values in `pmap:script` follow the [ISO 15924](https://unicode.org/iso15924/iso15924-codes.html) standard codes for the representation of names of scripts and are summarized in the table below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore this explanation of the ISO language codes as a note below the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note that the script names are from Unicode Standard Annex #24: Script Names
| Language | Native name | `name:*` property | [ISO 639-2 code](https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes) | [ISO_639-1 code](https://en.wikipedia.org/wiki/ISO_639-1) | [ISO_15924 script(s)](https://unicode.org/iso15924/iso15924-codes.html) | | ||
|--------|-----------------|-----------|-----|----|----| | ||
| Arabic | اَلْعَرَبِيَّةُ | `name:ar` | ara | ar | `Arabic` | | ||
| Bengali | বাংলা | `name:bn` | ben | bn | `Bengali` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's our longer term plan to support Bengali and Farsi (I think the only ones dropped from this list)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bengali to be added in the future as a positioned glyph font
Farsi we call it "Persian"
| ----- | ----- | ----- | ----- | | ||
| Arabic | اَلْعَرَبِيَّةُ | `name:ar` | `Arabic` | | ||
| Bulgarian | български | `name:bg` | `Cyrillic` | | ||
| Chinese (Simplified) | 中文 汉语 | `name:zh-Hans` | `Han` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we exporting zh
or differentiating name:zh-Hans
and name:zh-Hant
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not exporting zh
in the tileset, because that is not a useful developer-facing choice of locale. Ideally we normalize zh
in the raw data into both zh-Hans
and zh-Hant
if they are the same.
basemaps/localization.md
Outdated
| Urdu | اردو | `name:ur` | `Arabic` | | ||
| Vietnamese | Tiếng Việt | `name:vi` | `Latin` | | ||
|
||
_*) `Mixed-Japanese` is a custom `pmap:script` value used for labels that contain Hiragana or Katakana mixed with a second or third script. In Japanese, these two scripts often appear in combination with others._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* NOTE:
Mixed-Japanese
is a custompmap:script
value used for labels that contain Hiragana or Katakana mixed with a second or third script. In Japanese, these two scripts often appear in combination with others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
See more: | ||
|
||
- [Traditional MapLibre Text Rendering](https://oliverwipfli.ch/about-text-rendering-in-maplibre-2023-10-17/) | ||
- [Devanagari Positioned Glyph Fonts](https://oliverwipfli.ch/devanagari-in-the-protomaps-basemap-with-a-positioned-glyph-font-for-maplibre-2024-06-30/) | ||
|
||
## Styling localized name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's ok to take out this placeholder section... we're tilting towards using Protomaps as a platform solution instead of a modular system of tile schema, styles, and data archives. Ideally we offer some tips on how to work with the raw tile data, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I think this section under the basemaps
directory concerns the combination of style + tileset.
I think Localization
concerns the style, because that is the API surface that developers interact with - generating a GL style for a given language.
I do think there ought to be one section in the Basemaps directory for the schema with no style opinions - probably fleshed out https://docs.protomaps.com/basemaps/layers (issue #1)? That can mention localized name tags, and link out to this Localization page that has more prose?
Happy for feedback @nvkelso!