Add cli & logging; fix prefix-handling in str-transformer

dalito · Jan 15, 2024 · d692580 · d692580
1 parent 3559d29
commit d692580
Show file tree

Hide file tree

Showing 9 changed files with 323 additions and 67 deletions.
diff --git a/README.md b/README.md
@@ -1,18 +1,19 @@
 # Easier access to UCUM from Python
 
-> **This is almost done. Feedback welcome!**
-> The lark grammar to parse UCUM codes and the transformer that converts UCUM units to pint are implemented.
-> For some UCUM units we still have to define pint units or aliases and for some also name mappings.
+> **Feedback welcome!**
+> Currently only the conversion direction from UCM to pint is supported. Supporting pint to UCUM is not of high priority.
+> Please carefully review definitions before you trust them.
+> While we a lot of tests in place and reviewed the mappings carefully, bugs may still be present.
 
 [UCUM](https://ucum.org/) (Unified Code for Units of Measure) is a code system intended to cover all units of measures.
 It provides a formalism to express units in an unambiguous way suitable for electronic communication.
 Note that UCUM does non provide a canonical representation, e.g. `m/s` and `m.s-1` are expressing the same unit in two ways.
 
 **ucumvert** is a pip-installable Python package. Features:
 
-- Parser for UCUM unit strings that implements the full grammar
-- Converter for creating [pint](https://pypi.org/project/pint/) units from UCUM unit strings
-- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units
+- Parser for UCUM unit strings that implements the full grammar.
+- Converter for creating [pint](https://pypi.org/project/pint/) units from UCUM unit strings.
+- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units. All UCUM units from Version 2.1 of the specification are included.
 
 **ucumvert** generates the UCUM grammar by filling a template with unit codes, prefixes etc. from the official [ucum-essence.xml](https://github.com/ucum-org/ucum/blob/main/ucum-essence.xml) file (a copy is included in this repo).
 So updating the parser for new UCUM releases is straight forward.
@@ -50,10 +51,16 @@ Optionally you can visualize the parse trees with [Graphviz](https://www.graphvi
 
 ## Demo
 
-This is just a demo command line interface to show that the code does something...
+We provide a basic command line interface.
 
 ```cmd
 (.venv) $ ucumvert
+```
+
+It has an interactive mode to test parsing UCUM codes:
+
+```cmd
+(.venv) $ ucumvert -i
 Enter UCUM unit code to parse, or 'q' to quit.
 > m/s2.kg
 Created visualization of parse tree (parse_tree.png).
@@ -73,7 +80,7 @@ main_term
 > q
 ```
 
-So the intermediate result is a tree which is then traversed to convert the elements to pint:
+So the intermediate result is a tree which is then traversed to convert the elements to pint quantities (or pint-compatible strings with another transformer):
 
 ![parse tree](parse_tree.png)
 
@@ -90,9 +97,26 @@ You may use the package in your code for converting UCUM codes to pint like this
 >>>
 ```
 
+We also experimented with creating a UCUM-aware pint UnitRegistry.
+This has been tried by registering a preprocessor that intercepts the entered unit string and converts it from UCUM to pint.
+Due to the way preprocessors work, pint will then no longer accept standard pint unit expressions but only UCUM (see below).
+This is inconvenient! So we suggest to convert UCUM units as shown above, until a less disruptive way is found/possible.
+
+```python
+>>> from ucumvert import get_pint_registry
+>>> ureg = get_pint_registry()
+>>> ureg("m/s2.kg")
+<Quantity(1.0, 'kilogram * meter / second ** 2')>
+>>> ureg("Cel")
+<Quantity(1, 'degree_Celsius')>
+>>> ureg("degC")   # a standard pint unit code
+... (traceback cut out)
+lark.exceptions.UnexpectedCharacters: No terminal matches 'C' in the current parser context
+```
+
 ## Tests
 
-The unit tests include a test to parse all common UCUM unit codes from the official repo. To see this run
+The unit tests include parsing and converting all common UCUM unit codes from the official repo. Run the test suite by:
 
 ```bash
 pytest

diff --git a/pyproject.toml b/pyproject.toml
@@ -245,8 +245,6 @@ max-complexity = 10
 # Allow Pydantic's `@validator` decorator to trigger class method treatment.
 classmethod-decorators = [
   "classmethod",
-  # for pydantic 1.x
-  "pydantic.validator", "pydantic.class_validators.root_validator"
 ]
 
 [tool.ruff.format]
@@ -255,7 +253,7 @@ docstring-code-format = true
 
 [tool.codespell]
 
-skip = "pyproject.toml,src/ucumvert/vendor/ucum-essence.xml,src/ucumvert/vendor/ucum_examples.tsv,src/ucumvert/ucum_grammar.lark"
+skip = "pyproject.toml,src/ucumvert/vendor/ucum-essence.xml,src/ucumvert/vendor/ucum_examples.tsv,src/ucumvert/ucum_grammar.lark,src/ucumvert/pint_ucum_defs_mapping_report.txt"
 
 # Note: words have to be lowercased for the ignore-words-list
 ignore-words-list = "linke,tne,sie,smoot"

diff --git a/src/ucumvert/__init__.py b/src/ucumvert/__init__.py
@@ -1,9 +1,16 @@
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+
 from ucumvert.parser import (
     get_ucum_parser,
     make_parse_tree_png,
     update_lark_ucum_grammar_file,
 )
 from ucumvert.ucum_pint import (
+    UcumToPintStrTransformer,
     UcumToPintTransformer,
     get_pint_registry,
     ucum_preprocessor,
@@ -22,4 +29,38 @@
     "ucum_preprocessor",
     "update_lark_ucum_grammar_file",
     "UcumToPintTransformer",
+    "UcumToPintStrTransformer",
 ]
+
+# Note that nothing is passed to getLogger to set the "root" logger
+logger = logging.getLogger()
+
+
+def setup_logging(loglevel: int = logging.INFO, logfile: Path | None = None) -> None:
+    """
+    Setup logging to console and optionally a file.
+
+    The default loglevel is INFO.
+    """
+    loglevel_name = os.getenv("LOGLEVEL", "").strip().upper()
+    if loglevel_name in ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]:
+        loglevel = getattr(logging, loglevel_name, logging.INFO)
+
+    # Apply constraints. CRITICAL=FATAL=50 is the maximum, NOTSET=0 the minimum.
+    loglevel = min(logging.FATAL, max(loglevel, logging.NOTSET))
+
+    # Setup handler for logging to console
+    logging.basicConfig(level=loglevel, format="%(levelname)-8s|%(message)s")
+
+    if logfile is not None:
+        # Setup handler for logging to file
+        fh = logging.handlers.RotatingFileHandler(
+            logfile, maxBytes=100000, backupCount=5
+        )
+        fh.setLevel(loglevel)
+        fh_formatter = logging.Formatter(
+            fmt="%(asctime)s|%(name)-20s|%(levelname)-8s|%(message)s",
+            datefmt="%Y-%m-%d %H:%M:%S",
+        )
+        fh.setFormatter(fh_formatter)
+        logger.addHandler(fh)
diff --git a/src/ucumvert/cli.py b/src/ucumvert/cli.py
@@ -1,22 +1,48 @@
-from lark.exceptions import UnexpectedInput, VisitError
+import argparse
+import logging
+import sys
+import textwrap
+from pathlib import Path
 
-from ucumvert.parser import get_ucum_parser, make_parse_tree_png
-from ucumvert.ucum_pint import UcumToPintTransformer
+from lark.exceptions import LarkError, UnexpectedInput, VisitError
 
+from ucumvert import __version__, setup_logging
+from ucumvert.parser import (
+    get_ucum_parser,
+    make_parse_tree_png,
+    update_lark_ucum_grammar_file,
+)
+from ucumvert.ucum_pint import UcumToPintTransformer, find_matching_pint_definitions
 
-def main():
+try:
+    import pydot  # noqa: F401
+
+    has_pydot = True
+except ImportError:
+    has_pydot = False
+
+logger = logging.getLogger(__name__)
+
+
+def interactive():
     print("Enter UCUM unit code to parse, or 'q' to quit.")
+    if not has_pydot:
+        print("Package pydot not installed, skipping parse-tree image generation.")
+
     ucum_parser = get_ucum_parser()
 
     while True:
         ucum_code = input("> ")
         if ucum_code in "qQ":
             break
         try:
-            parsed_data = make_parse_tree_png(
-                ucum_code, filename="parse_tree.png", parser=ucum_parser
-            )
-            print("Created visualization of parse tree (parse_tree.png).")
+            if has_pydot:
+                parsed_data = make_parse_tree_png(
+                    ucum_code, filename="parse_tree.png", parser=ucum_parser
+                )
+                print("Created visualization of parse tree (parse_tree.png).")
+            else:
+                parsed_data = ucum_parser.parse(ucum_code)
             print(parsed_data.pretty())
         except UnexpectedInput as e:
             print(e)
@@ -30,9 +56,126 @@ def main():
             continue
 
 
-def run_cli_app():
-    main()
+# ===  argparse-cli-related code  ===
+
+
+class DecentFormatter(argparse.HelpFormatter):
+    """
+    An argparse formatter that preserves newlines & keeps indentation.
+    """
+
+    def _fill_text(self, text, width, indent):
+        """
+        Reformat text while keeping newlines for lines shorter than width.
+        """
+        lines = []
+        for line in textwrap.indent(textwrap.dedent(text), indent).splitlines():
+            lines.append(  # noqa: PERF401
+                textwrap.fill(line, width, subsequent_indent=indent)
+            )
+        return "\n".join(lines)
+
+    def _split_lines(self, text, width):
+        """
+        Conserve indentation in help/description lines when splitting long lines.
+        """
+        lines = []
+        for line in textwrap.dedent(text).splitlines():
+            if not line.strip():
+                continue
+            indent = " " * (len(line) - len(line.lstrip()))
+            lines.extend(
+                textwrap.fill(line, width, subsequent_indent=indent).splitlines()
+            )
+        return lines
+
+
+def root_cmds(args):
+    if args.version:  # pragma: no cover
+        print(f"ucumvert {__version__}")
+    if args.interactive:
+        interactive()
+    if args.mapping_report:
+        find_matching_pint_definitions(report_file=args.mapping_report)
+    if args.grammar_update:
+        grammar_file = Path(__file__).resolve().parent / "ucum_grammar.lark"
+        update_lark_ucum_grammar_file(grammar_file=grammar_file)
+
+
+def create_root_parser():
+    parser = argparse.ArgumentParser(
+        prog="ucumvert",
+        description=("Simple CLI for ucumvert."),
+        allow_abbrev=False,
+        formatter_class=DecentFormatter,
+    )
+    parser.add_argument(
+        "-V",
+        "--version",
+        help="The version of ucumvert.",
+        action="store_true",
+    )
+    parser.add_argument(
+        "-i",
+        "--interactive",
+        help="Interactive mode to explore parsing of UCUM unit codes.",
+        action="store_true",
+    )
+    parser.add_argument(
+        "-g",
+        "--grammar_update",
+        help=(
+            "Recreate grammar file 'ucum_grammar.lark' with UCUM atoms "
+            "extracted from ucum-essence.xml."
+        ),
+        action="store_true",
+    )
+    parser.add_argument(
+        "-m",
+        "--mapping_report",
+        help=(
+            "Write a report of mappings between UCUM unit atoms and pint "
+            "definitions to the given file. Default is to write to "
+            "'pint_ucum_defs_mapping_report.txt' in the current directory."
+        ),
+        type=Path,
+        metavar=("FILE"),
+        nargs="?",  # make file an optional argument
+        const=Path("pint_ucum_defs_mapping_report.txt"),  # default value
+    )
+    parser.set_defaults(func=root_cmds)
+    return parser
+
+
+def main_cli(raw_args=None):
+    """Setup CLI app and run commands based on arguments."""
+    # Create root parser for cli app
+    parser = create_root_parser()
+
+    if not raw_args:
+        parser.print_help()
+        return
+
+    # Parse the command-line arguments
+    #   parse_args will call sys.exit(2) if invalid commands are given.
+    args = parser.parse_args(raw_args)
+    setup_logging(loglevel=logging.INFO)
+    args.func(args)
+
+
+def run_cli_app(raw_args=None):
+    """Entry point for running the cli app."""
+    if raw_args is None:
+        raw_args = sys.argv[1:]
+    try:
+        main_cli(raw_args)
+    except LarkError:
+        logger.exception("Terminating with ucumvert error.")
+        sys.exit(1)
+    except Exception:
+        logger.exception("Unexpected error.")
+        sys.exit(3)  # value 2 is used by argparse for invalid args.
 
 
 if __name__ == "__main__":
-    main()
+    run_cli_app(sys.argv[1:])
diff --git a/src/ucumvert/parser.py b/src/ucumvert/parser.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import logging
 import textwrap
 from pathlib import Path
 
@@ -13,6 +14,9 @@
     get_prefixes,
 )
 
+logger = logging.getLogger(__name__)
+
+
 # UCUM syntax in the Backus-Naur Form, copied from https://ucum.org/ucum#section-Syntax-Rules
 # <sign>  : "+" | "-"
 # <digit> : "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
@@ -162,6 +166,7 @@ def update_lark_ucum_grammar_file(
     with grammar_file.open("w") as f:
         f.write("\n".join(wrapped))
         f.write("\n")  # newline at end of file
+    logger.info("Updated grammar written to '%s'.", grammar_file)
 
 
 def get_ucum_parser(grammar_file=None):
@@ -179,5 +184,5 @@ def make_parse_tree_png(data, filename="parse_tree_unit.png", parser=None):
     try:
         tree.pydot__tree_to_png(parsed_data, filename)
     except ImportError:
-        print("pydot not installed, skipping png generation")
+        logger.warning("pydot not installed, skipping png generation")
     return parsed_data
diff --git a/src/ucumvert/pint_ucum_defs_mapping_report.txt b/src/ucumvert/pint_ucum_defs_mapping_report.txt
@@ -331,4 +331,4 @@
 #   [car_Au] --> carat_of_gold_alloys (ucumvert registry)   # [car_Au] = 1/24  # NON_METRIC, carat of gold alloys, mass fraction (misc)
 #    [smoot] --> smoot (ucumvert registry)                  # [smoot] = 67 * [in_i]  # NON_METRIC, Smoot, length (misc)
 # [m/s2/Hz^(1/2)] --> meter_per_square_second_per_square_root_of_hertz (ucumvert registry) # [m/s2/Hz^(1/2)] = 1 * sqrt(1 m2/s4/Hz)  # NON_METRIC, meter per square seconds per square root of hertz, amplitude spectral density (misc)
-#      bit_s --> bit (ucumvert registry)                    # bit_s = 1 * ld(1 1)  # NON_METRIC, bit, amount of information (infotech)
+#      bit_s --> bit (ucumvert registry)                    # bit_s = 1 * ld(1 1)  # NON_METRIC, bit, amount of information (infotech)