USFM Extension to Scripture Text #301

jonathanrobie · 2023-03-07T14:50:00Z

The USFM / USX Technical Committee would like us to give guidance for defining extensions for Scripture Text.

The Technical Committee wants to define a way to declare conventions for converting visible characters to invisible ones, something that translation teams frequently do. For instance, ZWSP, bidi controls, soft hyphens, hard space, various kinds of spaces are frequently encoded with characters like ~ or / etc. The Technical Committee would like to know the best way to define, declare, and publish an extension that allows translation projects to explicitly declare the conventions they use for such purposes.

jag3773 · 2023-03-07T16:17:28Z

One idea here would be to use an x-role to define an ingredient's file that contains this information, see https://docs.burrito.bible/en/latest/schema_docs/role.html?highlight=x-role .

jonathanrobie · 2023-03-07T16:19:18Z

One idea here would be to use an x-role to define an ingredient's file that contains this information, see https://docs.burrito.bible/en/latest/schema_docs/role.html?highlight=x-role .

I like that. But the question then becomes how USFM/USX should define, declare, and publish the format for this file. I assume that USFM/USX should do that, but we need to know how.

jonathanrobie · 2023-03-07T16:36:38Z

The first step for USFM is to define the file format that defines this. We will then discuss whether to support this using role or x-role in Scripture Burrito.

FoolRunning · 2023-03-07T16:49:33Z

I would like to throw out there that this probably shouldn't be done at all. It would make more sense if the USFM files put inside a SB already had the non-USFM (non-Unicode?) data removed/replaced. Adding in a file that describes how users worked around limitations in the software they were using seems wrong. I would expect the USFM files to be Unicode-ready (i.e. there shouldn't be a need for other software consuming the SB to deal with the limitations of other software). To me, this feels akin to hacked fonts.

Some things like ~ and // I think are actually defined by USFM and should be valid as-is.

jonathanrobie · 2023-03-07T17:14:27Z

I would like to throw out there that this probably shouldn't be done at all. It would make more sense if the USFM files put inside a SB already had the non-USFM (non-Unicode?) data removed/replaced. Adding in a file that describes how users worked around limitations in the software they were using seems wrong. I would expect the USFM files to be Unicode-ready (i.e. there shouldn't be a need for other software consuming the SB to deal with the limitations of other software). To me, this feels akin to hacked fonts.

Some things like ~ and // I think are actually defined by USFM and should be valid as-is.

I agree in theory.

In practice, that means that editors like Paratext would have to:

Provide visible characters and ways to type them in, and
Convert them to the appropriate invisible characters when saving (or at the very latest, when creating a Burrito)

Does that seem like something that is likely to happen if we ask? If not, I think users will keep using these workarounds and they should be defined somewhere.

FoolRunning · 2023-03-07T18:23:25Z

Well, assuming the application (Paratext in this case) needs to create the file that needs to be in a SB, that means that said application is going to have to have a way to know the information (i.e. right now, it's "understood" by a project team and is probably fixed at publishing time - Paratext currently has no knowledge of these substitutions). This means that during import/export to/from a SB, it could very easily make the substitutions - there isn't a need to provide ways to "see them" in the UI nor does it have to exist on disk outside of a SB in that format.

Basically, if the application needs to generate the file and thus must have the substitution information stored in some form, then it could just as easily make the substitutions with that information when creating the USFM files for the SB.
Just my 2¢.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USFM Extension to Scripture Text #301

USFM Extension to Scripture Text #301

jonathanrobie commented Mar 7, 2023 •

edited

Loading

jag3773 commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

FoolRunning commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

FoolRunning commented Mar 7, 2023 •

edited

Loading

USFM Extension to Scripture Text #301

USFM Extension to Scripture Text #301

Comments

jonathanrobie commented Mar 7, 2023 • edited Loading

jag3773 commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

FoolRunning commented Mar 7, 2023

jonathanrobie commented Mar 7, 2023

FoolRunning commented Mar 7, 2023 • edited Loading

jonathanrobie commented Mar 7, 2023 •

edited

Loading

FoolRunning commented Mar 7, 2023 •

edited

Loading