Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify GlobusAuthRequiremenetsError validation #795

Conversation

sirosen
Copy link
Member

@sirosen sirosen commented Aug 5, 2023

My interest in making some changes to the GARE work has shifted since I started on this.
Originally, my goal was surface-level simplicity and brevity, regardless of how much sophisticated plumbing was needed.
It seemed problematic to me that we had a situation like

class Foo:
    code: str

    SUPPORTED_FIELDS = {
        "code": literal_validator("ConsentRequired"),
        ...,
    }

    def __init__(self, code: str | None, ...):
        ...

This would seem to declare a single field, code, three times. Each with subtly different apparent types -- str, str | None, and Literal[...].

I thought I could combine two of these together into something which looked like

class Foo:
    code: Annotated[Literal["ConsentRequired"], validators.AUTO]

    def __init__(self, code: str | None, ...):
        ...

and collapse away one of the redundancies.

A state approximating this was achieved in the course of this work: all tests should pass on 2ba3166 .

However, this was really too complex for our needs. After a conversation in which we questioned the wisdom of annotation inspection (and --keep-runtime-typing is a red flag for this), I reconsidered what the end goal of my refactor should be, and came up with the following target:

  • instance variables should be assigned explicitly, not via an implicit looping structure
  • those assignments should have values whose types are knowable to a type checker

The problem therefore was partly that the phrasing of the validator module as plain callables precluded passing the relevant and necessary type information around. Instead, converting all of the validators to implementers of a callable protocol allows us to express the idea that validators are generics which, upon invocation, return some "result type".
However we declare it, the correct type for the code validator above would be Validator[Literal["ConsentRequired"]]. (See the end of _variants.py for the real definition. It is quite simple.)

For the most part, the changes are a great deal of tedious copy-paste and not terribly interesting.
However, certain cases illuminate the value of a refactor which is more typing-friendly. Consider LegacyAuthorizationParameters, where it is now clear from the types that an input of str | list[str] | None is narrowed by the validator to list[str] | None.

The move from brevity towards explicit-ness reaches its apex in 768922f , which reintroduces some of the original redundancy I was seeking to remove. The issue I seek to solve in this changeset is not that we are declaring 3x that code is a field, but rather that we have declared it with multiple types and the interplay between those types is not clear. Therefore, there is an argument against seeking brevity as one of our criteria.
Even so, the changeset at that point did not actually increase our LOC -- it ever so barely manages to decrease it. This is in spite of a more verbose paradigm for the validators I attribute it primarily to the fact that looping over fields to assign them dynamically is not shorter or simpler than the "apparent verbosity" of explicit assignments if that logic is combined with class-level instance variable type declarations.

However, I've since tried to see if I could regain some of the terseness that I destroyed.
In the very last commits here, some slightly suspect use of inspect creeps back in. In particular, any critical feedback about the extraction of the f_locals of the grandparent frame of a helper would be very understandable and welcome.


All that said, was the mission, set forth in this PR title, accomplished? Is validation simpler now than it was before?
It is debatable, but I would say yes.
Consider the before and after of LegacyConsentRequiredTransferError -- this is definitely improved.
The non-legacy module is also now shorter -- by 30 lines -- and although there's now more magic involved in pulling out values (not just locals, but the parent frame's locals!), I would call it also more readable.

This is not necessarily our final iteration. I started to think about a descriptor for the validators with a clever fset, which might work well. But then I decided to open this PR and share what I've been working on.


📚 Documentation preview 📚: https://globus-sdk-python--795.org.readthedocs.build/en/795/

This is an experimental change to an experimental subpackage.
Rather than explicitly enumerating types 3x between
- class declaration
- SUPPORTED_FIELDS dict
- init declaration

and then auto-lifting `locals()` values into assignments, take a
slightly different tack of enumerating the types twice
- class declaration with Annotated
- init declaration

And then manually assigning fields (pylint will keep us honest about
missing parameters).

A new set of validator helpers cover our needs in terms of picking up
and using the validators.
If a 'DEFAULT' sentinel value is seen in the Annotated data, then the
type from the annotation can be mapped to a known validator. Bake this
logic into the validators.
Declare a well-typed protocol for validators, which defines the
`__call__` method to take a name (for the field) and a value to
validate.

Each field validation is now done with an explicit call to a
validator, and without any derivation of validators, types, or
meta-programming.

The tow remaining items which needed special handling were:
- checking for non-null values being passed for at least one argument
- loading a data object from a dict filtering out extra params

In these two cases, the desired arg names are found by examining
`__init__` and excluding `self` and `extra`, each of which has special
meaning.

In general, the paradigm shift offered here is to be more direct and
explicit, even if the requirement is that we are verbose in the
process. Partly, this was based on the realization that many of the
meta-programming techniques being used were not resulting in the
desired level of brevity and clarity.
An aborted implementation of annotation-based validators required that
we preserve runtime typing. That constraint is gone, meaning we don't
need the relevant flag for pre-commit.
Relegate the strategy of looking at `__init__` signatures to the
testsuite. Rather than doing this at runtime, reintroduce
`SUPPORTED_FIELDS` as a property of GARE-related classes, but as a set
of strings.

The testsuite now checks the set against `__init__` for the relevant
class.
The result is more verbose but simpler.
- Deriving a value from the `locals()` of a parent frame saves us some
  redundancy and is safe so long as the relevant helper remains purely
  internal
- Inspecting an `__init__` signature is simple and easy if it is done
  in the class declaration context when `__init__` is "just a
  function" and not yet bound to the class. No complex derivation of
  the `__init__` function is therefore needed

In order for this to work, `HasSupportedFields` needs to become an
explicit Protocol, or mypy incorrectly fails some of the variants.
Copy link
Member

@kurtmckee kurtmckee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this reduces the number of per-class type annotations that exist in the target branch. However, I dislike how much plumbing code that we have to write and expect others to read and deeply understand to support type annotations. Python type annotations offer far less value to me than the effort we are putting into their addition and upkeep.

I have a concern that type annotations are a stricture that wastes our time and taxes our minds, at least at the level that we are implementing them.

@sirosen sirosen marked this pull request as draft August 6, 2023 17:38
@sirosen
Copy link
Member Author

sirosen commented Aug 6, 2023

I'm converting this to a draft because I'm still very unhappy with the amount of plumbing as well. To me, the replication of type information and the lack of clarity around the differences between the validator, the type annotation, and a class-level annotation is a major problem, however, so I don't want to abandon this line of thinking. Only, perhaps, this implementation.

I did write a version which I'll share which goes all in, using Annotated and a metaclass to implement the logic by wrapping __init__:
https://github.com/sirosen/globus-sdk-python/tree/daft-punk-gare

I think the thing I really need to do is write the ultra-simplified version of this as an alternative so that we can look at a more realistic option.

@sirosen
Copy link
Member Author

sirosen commented Aug 6, 2023

I (re)learned some interesting things while implementing this and trying to build a micro-framework which would do the job for us.

The biggest lesson I relearned was: don't build micro-frameworks.

#796 encodes all of what I learned in terms of getting types aligned, using Protocols to avoid messy subclass relationships, and dodging the pitfalls of overly-reflective python. I hope it is to everyone's liking; I know it is to mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants