Skip to content

Commit

Permalink
Add cli & logging; fix prefix-handling in str-transformer
Browse files Browse the repository at this point in the history
  • Loading branch information
dalito committed Jan 15, 2024
1 parent 3559d29 commit d692580
Show file tree
Hide file tree
Showing 9 changed files with 323 additions and 67 deletions.
42 changes: 33 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
# Easier access to UCUM from Python

> **This is almost done. Feedback welcome!**
> The lark grammar to parse UCUM codes and the transformer that converts UCUM units to pint are implemented.
> For some UCUM units we still have to define pint units or aliases and for some also name mappings.
> **Feedback welcome!**
> Currently only the conversion direction from UCM to pint is supported. Supporting pint to UCUM is not of high priority.
> Please carefully review definitions before you trust them.
> While we a lot of tests in place and reviewed the mappings carefully, bugs may still be present.
[UCUM](https://ucum.org/) (Unified Code for Units of Measure) is a code system intended to cover all units of measures.
It provides a formalism to express units in an unambiguous way suitable for electronic communication.
Note that UCUM does non provide a canonical representation, e.g. `m/s` and `m.s-1` are expressing the same unit in two ways.

**ucumvert** is a pip-installable Python package. Features:

- Parser for UCUM unit strings that implements the full grammar
- Converter for creating [pint](https://pypi.org/project/pint/) units from UCUM unit strings
- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units
- Parser for UCUM unit strings that implements the full grammar.
- Converter for creating [pint](https://pypi.org/project/pint/) units from UCUM unit strings.
- A pint unit definition file [pint_ucum_defs.txt](https://github.com/dalito/ucumvert/blob/main/src/ucumvert/pint_ucum_defs.txt) that extends pint´s default units with UCUM units. All UCUM units from Version 2.1 of the specification are included.

**ucumvert** generates the UCUM grammar by filling a template with unit codes, prefixes etc. from the official [ucum-essence.xml](https://github.com/ucum-org/ucum/blob/main/ucum-essence.xml) file (a copy is included in this repo).
So updating the parser for new UCUM releases is straight forward.
Expand Down Expand Up @@ -50,10 +51,16 @@ Optionally you can visualize the parse trees with [Graphviz](https://www.graphvi

## Demo

This is just a demo command line interface to show that the code does something...
We provide a basic command line interface.

```cmd
(.venv) $ ucumvert
```

It has an interactive mode to test parsing UCUM codes:

```cmd
(.venv) $ ucumvert -i
Enter UCUM unit code to parse, or 'q' to quit.
> m/s2.kg
Created visualization of parse tree (parse_tree.png).
Expand All @@ -73,7 +80,7 @@ main_term
> q
```

So the intermediate result is a tree which is then traversed to convert the elements to pint:
So the intermediate result is a tree which is then traversed to convert the elements to pint quantities (or pint-compatible strings with another transformer):

![parse tree](parse_tree.png)

Expand All @@ -90,9 +97,26 @@ You may use the package in your code for converting UCUM codes to pint like this
>>>
```

We also experimented with creating a UCUM-aware pint UnitRegistry.
This has been tried by registering a preprocessor that intercepts the entered unit string and converts it from UCUM to pint.
Due to the way preprocessors work, pint will then no longer accept standard pint unit expressions but only UCUM (see below).
This is inconvenient! So we suggest to convert UCUM units as shown above, until a less disruptive way is found/possible.

```python
>>> from ucumvert import get_pint_registry
>>> ureg = get_pint_registry()
>>> ureg("m/s2.kg")
<Quantity(1.0, 'kilogram * meter / second ** 2')>
>>> ureg("Cel")
<Quantity(1, 'degree_Celsius')>
>>> ureg("degC") # a standard pint unit code
... (traceback cut out)
lark.exceptions.UnexpectedCharacters: No terminal matches 'C' in the current parser context
```

## Tests

The unit tests include a test to parse all common UCUM unit codes from the official repo. To see this run
The unit tests include parsing and converting all common UCUM unit codes from the official repo. Run the test suite by:

```bash
pytest
Expand Down
4 changes: 1 addition & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -245,8 +245,6 @@ max-complexity = 10
# Allow Pydantic's `@validator` decorator to trigger class method treatment.
classmethod-decorators = [
"classmethod",
# for pydantic 1.x
"pydantic.validator", "pydantic.class_validators.root_validator"
]

[tool.ruff.format]
Expand All @@ -255,7 +253,7 @@ docstring-code-format = true

[tool.codespell]

skip = "pyproject.toml,src/ucumvert/vendor/ucum-essence.xml,src/ucumvert/vendor/ucum_examples.tsv,src/ucumvert/ucum_grammar.lark"
skip = "pyproject.toml,src/ucumvert/vendor/ucum-essence.xml,src/ucumvert/vendor/ucum_examples.tsv,src/ucumvert/ucum_grammar.lark,src/ucumvert/pint_ucum_defs_mapping_report.txt"

# Note: words have to be lowercased for the ignore-words-list
ignore-words-list = "linke,tne,sie,smoot"
Expand Down
41 changes: 41 additions & 0 deletions src/ucumvert/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
from __future__ import annotations

import logging
import os
from pathlib import Path

from ucumvert.parser import (
get_ucum_parser,
make_parse_tree_png,
update_lark_ucum_grammar_file,
)
from ucumvert.ucum_pint import (
UcumToPintStrTransformer,
UcumToPintTransformer,
get_pint_registry,
ucum_preprocessor,
Expand All @@ -22,4 +29,38 @@
"ucum_preprocessor",
"update_lark_ucum_grammar_file",
"UcumToPintTransformer",
"UcumToPintStrTransformer",
]

# Note that nothing is passed to getLogger to set the "root" logger
logger = logging.getLogger()


def setup_logging(loglevel: int = logging.INFO, logfile: Path | None = None) -> None:
"""
Setup logging to console and optionally a file.
The default loglevel is INFO.
"""
loglevel_name = os.getenv("LOGLEVEL", "").strip().upper()
if loglevel_name in ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]:
loglevel = getattr(logging, loglevel_name, logging.INFO)

# Apply constraints. CRITICAL=FATAL=50 is the maximum, NOTSET=0 the minimum.
loglevel = min(logging.FATAL, max(loglevel, logging.NOTSET))

# Setup handler for logging to console
logging.basicConfig(level=loglevel, format="%(levelname)-8s|%(message)s")

if logfile is not None:
# Setup handler for logging to file
fh = logging.handlers.RotatingFileHandler(
logfile, maxBytes=100000, backupCount=5
)
fh.setLevel(loglevel)
fh_formatter = logging.Formatter(
fmt="%(asctime)s|%(name)-20s|%(levelname)-8s|%(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
fh.setFormatter(fh_formatter)
logger.addHandler(fh)
165 changes: 154 additions & 11 deletions src/ucumvert/cli.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,48 @@
from lark.exceptions import UnexpectedInput, VisitError
import argparse
import logging
import sys
import textwrap
from pathlib import Path

from ucumvert.parser import get_ucum_parser, make_parse_tree_png
from ucumvert.ucum_pint import UcumToPintTransformer
from lark.exceptions import LarkError, UnexpectedInput, VisitError

from ucumvert import __version__, setup_logging
from ucumvert.parser import (
get_ucum_parser,
make_parse_tree_png,
update_lark_ucum_grammar_file,
)
from ucumvert.ucum_pint import UcumToPintTransformer, find_matching_pint_definitions

def main():
try:
import pydot # noqa: F401

has_pydot = True
except ImportError:
has_pydot = False

logger = logging.getLogger(__name__)


def interactive():
print("Enter UCUM unit code to parse, or 'q' to quit.")
if not has_pydot:
print("Package pydot not installed, skipping parse-tree image generation.")

ucum_parser = get_ucum_parser()

while True:
ucum_code = input("> ")
if ucum_code in "qQ":
break
try:
parsed_data = make_parse_tree_png(
ucum_code, filename="parse_tree.png", parser=ucum_parser
)
print("Created visualization of parse tree (parse_tree.png).")
if has_pydot:
parsed_data = make_parse_tree_png(
ucum_code, filename="parse_tree.png", parser=ucum_parser
)
print("Created visualization of parse tree (parse_tree.png).")
else:
parsed_data = ucum_parser.parse(ucum_code)
print(parsed_data.pretty())
except UnexpectedInput as e:
print(e)
Expand All @@ -30,9 +56,126 @@ def main():
continue


def run_cli_app():
main()
# === argparse-cli-related code ===


class DecentFormatter(argparse.HelpFormatter):
"""
An argparse formatter that preserves newlines & keeps indentation.
"""

def _fill_text(self, text, width, indent):
"""
Reformat text while keeping newlines for lines shorter than width.
"""
lines = []
for line in textwrap.indent(textwrap.dedent(text), indent).splitlines():
lines.append( # noqa: PERF401
textwrap.fill(line, width, subsequent_indent=indent)
)
return "\n".join(lines)

def _split_lines(self, text, width):
"""
Conserve indentation in help/description lines when splitting long lines.
"""
lines = []
for line in textwrap.dedent(text).splitlines():
if not line.strip():
continue
indent = " " * (len(line) - len(line.lstrip()))
lines.extend(
textwrap.fill(line, width, subsequent_indent=indent).splitlines()
)
return lines


def root_cmds(args):
if args.version: # pragma: no cover
print(f"ucumvert {__version__}")
if args.interactive:
interactive()
if args.mapping_report:
find_matching_pint_definitions(report_file=args.mapping_report)
if args.grammar_update:
grammar_file = Path(__file__).resolve().parent / "ucum_grammar.lark"
update_lark_ucum_grammar_file(grammar_file=grammar_file)


def create_root_parser():
parser = argparse.ArgumentParser(
prog="ucumvert",
description=("Simple CLI for ucumvert."),
allow_abbrev=False,
formatter_class=DecentFormatter,
)
parser.add_argument(
"-V",
"--version",
help="The version of ucumvert.",
action="store_true",
)
parser.add_argument(
"-i",
"--interactive",
help="Interactive mode to explore parsing of UCUM unit codes.",
action="store_true",
)
parser.add_argument(
"-g",
"--grammar_update",
help=(
"Recreate grammar file 'ucum_grammar.lark' with UCUM atoms "
"extracted from ucum-essence.xml."
),
action="store_true",
)
parser.add_argument(
"-m",
"--mapping_report",
help=(
"Write a report of mappings between UCUM unit atoms and pint "
"definitions to the given file. Default is to write to "
"'pint_ucum_defs_mapping_report.txt' in the current directory."
),
type=Path,
metavar=("FILE"),
nargs="?", # make file an optional argument
const=Path("pint_ucum_defs_mapping_report.txt"), # default value
)
parser.set_defaults(func=root_cmds)
return parser


def main_cli(raw_args=None):
"""Setup CLI app and run commands based on arguments."""
# Create root parser for cli app
parser = create_root_parser()

if not raw_args:
parser.print_help()
return

# Parse the command-line arguments
# parse_args will call sys.exit(2) if invalid commands are given.
args = parser.parse_args(raw_args)
setup_logging(loglevel=logging.INFO)
args.func(args)


def run_cli_app(raw_args=None):
"""Entry point for running the cli app."""
if raw_args is None:
raw_args = sys.argv[1:]
try:
main_cli(raw_args)
except LarkError:
logger.exception("Terminating with ucumvert error.")
sys.exit(1)
except Exception:
logger.exception("Unexpected error.")
sys.exit(3) # value 2 is used by argparse for invalid args.


if __name__ == "__main__":
main()
run_cli_app(sys.argv[1:])
7 changes: 6 additions & 1 deletion src/ucumvert/parser.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import logging
import textwrap
from pathlib import Path

Expand All @@ -13,6 +14,9 @@
get_prefixes,
)

logger = logging.getLogger(__name__)


# UCUM syntax in the Backus-Naur Form, copied from https://ucum.org/ucum#section-Syntax-Rules
# <sign> : "+" | "-"
# <digit> : "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Expand Down Expand Up @@ -162,6 +166,7 @@ def update_lark_ucum_grammar_file(
with grammar_file.open("w") as f:
f.write("\n".join(wrapped))
f.write("\n") # newline at end of file
logger.info("Updated grammar written to '%s'.", grammar_file)


def get_ucum_parser(grammar_file=None):
Expand All @@ -179,5 +184,5 @@ def make_parse_tree_png(data, filename="parse_tree_unit.png", parser=None):
try:
tree.pydot__tree_to_png(parsed_data, filename)
except ImportError:
print("pydot not installed, skipping png generation")
logger.warning("pydot not installed, skipping png generation")
return parsed_data
2 changes: 1 addition & 1 deletion src/ucumvert/pint_ucum_defs_mapping_report.txt
Original file line number Diff line number Diff line change
Expand Up @@ -331,4 +331,4 @@
# [car_Au] --> carat_of_gold_alloys (ucumvert registry) # [car_Au] = 1/24 # NON_METRIC, carat of gold alloys, mass fraction (misc)
# [smoot] --> smoot (ucumvert registry) # [smoot] = 67 * [in_i] # NON_METRIC, Smoot, length (misc)
# [m/s2/Hz^(1/2)] --> meter_per_square_second_per_square_root_of_hertz (ucumvert registry) # [m/s2/Hz^(1/2)] = 1 * sqrt(1 m2/s4/Hz) # NON_METRIC, meter per square seconds per square root of hertz, amplitude spectral density (misc)
# bit_s --> bit (ucumvert registry) # bit_s = 1 * ld(1 1) # NON_METRIC, bit, amount of information (infotech)
# bit_s --> bit (ucumvert registry) # bit_s = 1 * ld(1 1) # NON_METRIC, bit, amount of information (infotech)
Loading

0 comments on commit d692580

Please sign in to comment.