Skip to content

Commit

Permalink
Merge pull request #6 from jquast/jq/1.0.6
Browse files Browse the repository at this point in the history
documentation fixes, javanese fix, v1.0.6
  • Loading branch information
jquast authored Dec 15, 2023
2 parents b2487f1 + 5b832e0 commit eaff4ba
Show file tree
Hide file tree
Showing 34 changed files with 1,790 additions and 1,084 deletions.
2,102 changes: 1,427 additions & 675 deletions data/macos-Alacritty-0.12.3_1.yaml

Large diffs are not rendered by default.

8 changes: 6 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,16 @@
project = "ucs-detect"
copyright = "2023, Jeff Quast"
author = "Jeff Quast"
release = "1.0.5"
release = "1.0.6"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ["sphinx_rtd_dark_mode"]

# export DARK=1 for old, tired eyes!
import os
if os.environ.get('DARK'):
extensions = ["sphinx_rtd_dark_mode"]

templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
Expand Down
1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,5 @@ Welcome to ucs-detect's documentation!

intro
results
sw_results/*

* :ref:`search`
148 changes: 79 additions & 69 deletions docs/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,18 @@ Without any arguments,

$ ucs-detect

The tool, ``ucs-detect``, tests the Unicode support level of the terminal
emulator for Wide character, Emoji sequence, and Language support and displays a
brief summary of the results. Add --save-yaml argument to store a detailed
report.
``ucs-detect`` automatically tests the Unicode version and support level of a
terminal emulator for Wide character, Emoji Zero Width Joiner (ZWJ) sequences,
Emoji Variation Selector-16 (VS-16) sequences, and Zero-Width or combining
characters by supported Language. A brief report is then printed to stdout.

.. figure:: https://dxtz6bzwq9sxx.cloudfront.net/ucs-detect.gif
:alt: video demonstration of executing ucs-detect

Versions of *ucs-detect* prior to 1.0 served only a single purpose, to export an
sh_-compatible line for export of ``UNICODE_VERSION``. To continue this purpose,
use ``--shell --quick``, for example:

::

$ ucs-detect --shell --quick
UNICODE_VERSION=15.0.0; export UNICODE_VERSION


Test Results
------------

Popular terminals were tested using this program and their results were collated
at https://ucs-detect.readthedocs.io/results.html
:alt: video demonstration of running ucs-detect

Installation & Usage
--------------------

To install:
To install or upgrade:

::

Expand All @@ -46,85 +30,112 @@ To use::
$ ucs-detect


To create a yaml data file result::
To run a detailed test and store a yaml report to disk::

$ ucs-detect --save-yaml my-terminal.yaml --limit-code points=5000 --limit-words=5000 --limit-errors=500
$ ucs-detect --save-yaml=data/my-terminal.yaml --limit-codepoints=5000 --limit-words=5000 --limit-errors=500

Test Results
------------

UNICODE_VERSION
---------------
More than twenty modern terminals for Windows, Linux, and Mac were tested,
their results have been collected into this repository and a detailed
summary is published at URL https://ucs-detect.readthedocs.io/results.html

The environment variable, ``UNICODE_VERSION`` is used by the python wcwidth_
library, which contains every past unicode table version, to determine how
dependent python programs, such as IPython_ render wide and zero-width
characters.
An article describing the development of ucs-detect and summarizing the results
for the 1.0.4 release of ucs-detect (November 2023) is published at
https://www.jeffquast.com/post/ucs-detect-test-results/

Create sh_-compatible line for export::
Individual yaml data file reports for these terminals may also be inspected at
the repository folder ``data``,
https://github.com/jquast/ucs-detect/tree/master/data

$ ucs-detect --shell --quick
UNICODE_VERSION=15.0.0; export UNICODE_VERSION

$ eval "$(ucs-detect --quick --shell)"
$ echo $UNICODE_VERSION
15.0.0
Please note that results will be shared with Terminal Emulator projects and this
information may become out of date as they improve their support for Unicode.
Please do not expect the maintainers of ucs-detect to update these data files. If
you wish for this report to be corrected for any given Terminal, please feel free
to submit a pull request with an update to the yaml data files.

Problem
-------

Chinese, Japanese, Korean, and Emoticon characters are "double-wide", occupying
2 cells, instead of 1, and some other special characters are "zero-width", which
do not occopy any cells at all, or modify the previous cell as a "combining"
character.
Many East Asian languages contain Wide (W) or Fullwidth (F) characters, meaning
that each character occupies 2 cells instead of 1. Further, many languages
contain special combining characters that are "zero width", meaning they do not
occupy any cells, only modifying the previous one as a "combining" character.
Finally, there are "Zero Width Joiner" and "Variation Selector-16" characters
that are used in sequence for Emoji characters.

A terminal application that displays these characters may have trouble
determining how it will be displayed to the end-user. This problem
happens often, because the Unicode Consortium releases new versions
of the Unicode Standard periodically, but the source code of libraries
and applications are not updated at the same time, or at all!

Finally, a terminal emulator may have varying levels of support. For example, at
time of this writing, Microsoft's Terminal.exe supports up to Unicode 15.0 for
Wide characters, is missing support for 27 characters of Unicode 13.0, has no
support for Emoji ZWJ, fully supports all VS-16 sequences, but fails to
correctly categorize many Zero-Width for 88 or more of the world's languages.


Solution
--------

The most important factor is to determine: **What version of unicode is the
Terminal using?**
The most important factor is to determine whether the Terminal Emulator complies
with the Specification_ published by the python wcwidth_ library.

This program, ``ucs-detect``, is able to **automatically detect** the version of
unicode that the connecting Terminal supports for WIDE characters. The python
wcwidth_ library supports **all** Unicode versions, 4.1.0 through 12.1.0 at time
of this writing, and so it is able to select and match the correct width, by
selecting for the given value of the ``UNICODE_VERSION`` environment variable.
This program, ``ucs-detect``, is able to **automatically detect** the version
and feature level support of unicode that the connecting Terminal supports for
WIDE, ZERO, ZWJ, and VS-16 characters.

How it works
------------

The unicode version is determined using the `Query Cursor Position`_ terminal
sequence, which asks, *"where is the cursor?"* using a special sequence, and
conforming terminals reply.
The solution in this program is the use of the `Query Cursor Position`_ terminal
sequence, which asks, *"where is the cursor?"*. This is a hidden sequence that a
Terminal Emulator automatically responds to.

By displaying a series of Wide Unicode characters for each Unicode version
expected to advance the cursor by 2 cells, the very last version that
successfully advances 2 cells determines the version of Unicode supported by the
Terminal.
By use of this sequence, and the data tables of the wcwidth_ library,
we can test for compliance of the python wcwidth_ library Specification_.

This solution of using `Query Cursor Position`_ and exporting an sh_ variable is
precisely the same solution used by the `resize(1)`_ program distributed with
X11, which determines the terminal size over transports that are not capable of
communicating or forwarding it (such as over a serial line).
The use of `Query Cursor Position`_ is inspired by the `resize(1)`_ program
distributed with X11, which determines the terminal size over transports that
are not capable of communicating by signal or forwarding by environment value,
such as over a serial line. `resize(1)` simply moves to (999, 999) then asks,
"where is my cursor?" and the response is understood to be the terminal size.

Further
-------
UNICODE_VERSION (legacy)
------------------------

I hope that this CLI tool is provisional! I'd like to see all Terminals
automatically export the environment variable, ``UNICODE_VERSION`` and that this
tool would not be required.
.. note:: This feature is planned for deprecation, see https://github.com/jquast/wcwidth/issues/104

If you would like to read more about this tool and the related problems I hope to
address with the ``UNICODE_VERSION`` environment variable, have a look at this
companion article, https://www.jeffquast.com/post/terminal_wcwidth_solution/
Versions of *ucs-detect* prior to 1.0 served only a single purpose, to export an
sh_-compatible line for export of ``UNICODE_VERSION``. To continue this purpose,
use ``--shell --quick``, for example::

$ ucs-detect --shell --quick
UNICODE_VERSION=15.0.0; export UNICODE_VERSION

It is designed to be used interactively::

$ eval "$(ucs-detect --quick --shell)"
$ echo $UNICODE_VERSION
15.0.0

The environment variable, ``UNICODE_VERSION`` is currently used by the python
wcwidth_ library, which contains every past unicode table version, to determine
how dependent python programs, such as IPython_ render wide and zero-width
characters.

History
=======

- 1.0.6 (2023-12-15): Distribution fix for UDHR data and bugfix for python 3.8
through 3.11. *ucs-detect* Welcomes `@GalaxySnail
<https://github.com/GalaxySnail/>`_ as a new project contributor.

- 1.0.5 (2023-11-13): Set minimum wcwidth_ release version requirement.

- 1.0.4 (2023-11-13): Add support for Emoji with VS-16 and more complete testing.
Published test results.

Expand All @@ -137,8 +148,7 @@ History
.. _IPython: https://ipython.org/
.. _python-prompt-toolkit: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/PROJECTS.rst#projects-using-prompt_toolkit
.. _sh: https://en.wikipedia.org/wiki/Bourne_shell
.. _vercel/hyper: https://github.com/vercel/hyper
.. _wcwidth.c: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
.. _wcwidth: https://github.com/jquast/wcwidth
.. _`Query Cursor Position`: https://blessed.readthedocs.io/en/latest/location.html#finding-the-cursor
.. _`resize(1)`: https://github.com/joejulian/xterm/blob/master/resize.c
.. _Specification: https://wcwidth.readthedocs.io/en/latest/specs.html
Loading

0 comments on commit eaff4ba

Please sign in to comment.