Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USFM Extension to Scripture Text #301

Open
jonathanrobie opened this issue Mar 7, 2023 · 6 comments
Open

USFM Extension to Scripture Text #301

jonathanrobie opened this issue Mar 7, 2023 · 6 comments

Comments

@jonathanrobie
Copy link
Collaborator

jonathanrobie commented Mar 7, 2023

The USFM / USX Technical Committee would like us to give guidance for defining extensions for Scripture Text.

The Technical Committee wants to define a way to declare conventions for converting visible characters to invisible ones, something that translation teams frequently do. For instance, ZWSP, bidi controls, soft hyphens, hard space, various kinds of spaces are frequently encoded with characters like ~ or / etc. The Technical Committee would like to know the best way to define, declare, and publish an extension that allows translation projects to explicitly declare the conventions they use for such purposes.

@jag3773
Copy link
Collaborator

jag3773 commented Mar 7, 2023

One idea here would be to use an x-role to define an ingredient's file that contains this information, see https://docs.burrito.bible/en/latest/schema_docs/role.html?highlight=x-role .

@jonathanrobie
Copy link
Collaborator Author

One idea here would be to use an x-role to define an ingredient's file that contains this information, see https://docs.burrito.bible/en/latest/schema_docs/role.html?highlight=x-role .

I like that. But the question then becomes how USFM/USX should define, declare, and publish the format for this file. I assume that USFM/USX should do that, but we need to know how.

@jonathanrobie
Copy link
Collaborator Author

The first step for USFM is to define the file format that defines this. We will then discuss whether to support this using role or x-role in Scripture Burrito.

@FoolRunning
Copy link
Collaborator

I would like to throw out there that this probably shouldn't be done at all. It would make more sense if the USFM files put inside a SB already had the non-USFM (non-Unicode?) data removed/replaced. Adding in a file that describes how users worked around limitations in the software they were using seems wrong. I would expect the USFM files to be Unicode-ready (i.e. there shouldn't be a need for other software consuming the SB to deal with the limitations of other software). To me, this feels akin to hacked fonts.

Some things like ~ and // I think are actually defined by USFM and should be valid as-is.

@jonathanrobie
Copy link
Collaborator Author

I would like to throw out there that this probably shouldn't be done at all. It would make more sense if the USFM files put inside a SB already had the non-USFM (non-Unicode?) data removed/replaced. Adding in a file that describes how users worked around limitations in the software they were using seems wrong. I would expect the USFM files to be Unicode-ready (i.e. there shouldn't be a need for other software consuming the SB to deal with the limitations of other software). To me, this feels akin to hacked fonts.

Some things like ~ and // I think are actually defined by USFM and should be valid as-is.

I agree in theory.

In practice, that means that editors like Paratext would have to:

  1. Provide visible characters and ways to type them in, and
  2. Convert them to the appropriate invisible characters when saving (or at the very latest, when creating a Burrito)

Does that seem like something that is likely to happen if we ask? If not, I think users will keep using these workarounds and they should be defined somewhere.

@FoolRunning
Copy link
Collaborator

FoolRunning commented Mar 7, 2023

Well, assuming the application (Paratext in this case) needs to create the file that needs to be in a SB, that means that said application is going to have to have a way to know the information (i.e. right now, it's "understood" by a project team and is probably fixed at publishing time - Paratext currently has no knowledge of these substitutions). This means that during import/export to/from a SB, it could very easily make the substitutions - there isn't a need to provide ways to "see them" in the UI nor does it have to exist on disk outside of a SB in that format.

Basically, if the application needs to generate the file and thus must have the substitution information stored in some form, then it could just as easily make the substitutions with that information when creating the USFM files for the SB.
Just my 2¢.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants