Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of an update to the Overloads chapter (DRAFT: DO NOT MERGE) #1839

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

erictraut
Copy link
Collaborator

  • Attempts to clearly define the algorithm for overload matching.
  • Describes checks for overload consistency, overlapping overloads, and implementation consistency.

erictraut and others added 2 commits August 13, 2024 17:06
* Attempts to clearly define the algorithm for overload matching.
* Describes checks for overload consistency, overlapping overloads, and implementation consistency.
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(two quick comments)

docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Show resolved Hide resolved
docs/spec/overload.rst Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this tricky area! I haven't finished review yet, but may be called away soon, so I'm submitting the comments I have so far. (EDIT: I've now completed my review.)

docs/spec/overload.rst Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
docs/spec/overload.rst Outdated Show resolved Hide resolved
@erictraut
Copy link
Collaborator Author

erictraut commented Aug 28, 2024

We typically wait for a proposed spec change to be accepted by the TC prior to writing conformance tests. In this case, I think it's advisable to write the conformance tests prior to acceptance. This will help us validate the proposed spec changes and tell us if (and to what extent) these changes will be disruptive for existing stubs and current type checker implementations.

I would normally volunteer to write the conformance tests, but in this case I think it would be preferable for someone else to write the tests based on their reading of the spec update. If I write the tests, there's a real possibility that they will match what's in my head but not accurately reflect the letter of the spec. There's also a possibility that I'll miss some important cases in the tests. If someone else writes the tests, they can help identify holes and ambiguities in the spec language.

Is there anyone willing to volunteer to write a draft set of conformance tests for this overload functionality? I'm thinking that there should be four new test files:

  1. overloads_definitions: Tests the rules defined in the "Invalid overload definitions" section
  2. overloads_consistency: Tests the rules defined in the "Implementation consistency" section
  3. overloads_overlap: Tests the rules defined in the "Overlapping overloads" section
  4. overloads_evaluation: Tests the rules defined in the "Overload call evaluation" section

If this is more work than any one person wants to volunteer for, we could split it up.

@carljm
Copy link
Member

carljm commented Aug 28, 2024

I am willing to work on conformance tests for this, but I probably can't get to it until the core dev sprint, Sept 23-27. I realize that implies a delay to moving forward with this PR. Happy for someone else to get to it first.

@carljm
Copy link
Member

carljm commented Jan 10, 2025

I've completed the first set of tests (for the "Invalid overload definitions" section of the spec.) I just realized that I named it overloads_invalid.py, where you had suggested overloads_definitions.py -- let me know if you feel strongly about this naming, and I can change it.

It's slow going adding these tests, because running python main.py is slow, and I typically want to check the type checker behavior against each test, then update each of the four result toml files, then sometimes tweak the expectations of the test and run it again...

One other limitation of the conformance suite that I've observed is that sometimes the rules differ for stub files vs normal files, but as far as I can see the conformance suite tooling doesn't support a stub file as an actual test file, only as an importable resource for a non-stub file. Has there been prior discussion of lifting this limitation?

@erictraut
Copy link
Collaborator Author

the conformance suite tooling doesn't support a stub file as an actual test file

I don't think this has been a requirement yet. The test infrastructure can be updated if necessary.


When a type checker checks the implementation for consistency with overloads,
it should first apply any transforms that change the effective type of the
implementation including the presence of a ``yield`` statement in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a circumstance in which the presence of a yield in the body of a function changes the meaning of its return type annotation? My understanding is that it does not: a generator must be manually annotated with a Generator or Iterator return type. So I'm not sure how to write a test for this mention of "presence of a yield statement." Should this be removed from the text?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, the presence of a yield statement doesn't change the effective type of the return type annotation. It does affect the inferred return type if the type checker (like pyright) implements return type inference. I think I meant to say "async keyword" rather than "yield statement". This should be changed in the spec.

Comment on lines +254 to +274
Step 1: Examine the argument list to determine the number of
positional and keyword arguments. Use this information to eliminate any
overload candidates that are not plausible based on their
input signatures.

- If no candidate overloads remain, generate an error and stop.
- If only one candidate overload remains, it is the winning match. Evaluate
it as if it were a non-overloaded function call and stop.
- If two or more candidate overloads remain, proceed to step 2.


Step 2: Evaluate each remaining overload as a regular (non-overloaded)
call to determine whether it is compatible with the supplied
argument list. Unlike step 1, this step considers the types of the parameters
and arguments. During this step, do not generate any user-visible errors.
Simply record which of the overloads result in evaluation errors.

- If all overloads result in errors, proceed to step 3.
- If only one overload evaluates without error, it is the winning match.
Evaluate it as if it were a non-overloaded function call and stop.
- If two or more candidate overloads remain, proceed to step 4.
Copy link
Member

@carljm carljm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation of step 1 and step 2 here results in a more complex algorithm than if they were combined into "iterate the overloads looking for the first one that binds to the call without error, if none, issue a no-matching-overload error." Because in your typing-meetup presentation, you emphasized that we have a complex algorithm for overload matching due to legacy, I assumed that this complexity must originate from long-time mypy behavior. So I was surprised to find that while this algorithm matches pyright behavior, mypy appears to use the simpler one-iteration combined algorithm. Pyre agrees with pyright.

I don't have strong feelings here (I could see arguments in favor of either behavior), but it seems that if we don't have agreement on this algorithm between existing type checkers (and thus don't have a clear backwards-compatibility argument in favor of specifying one behavior), perhaps we should have some discussion of the pros and cons of specifying this more-complex algorithm? Or some input from mypy developers (@hauntsaninja?) on whether mypy would be willing to switch to the algorithm described here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mypy is using this same algorithm. Why do you think it isn't? In the mypy playground example that you provided, there are two overload signatures. Step one eliminates one, which leaves only one remaining. When it is evaluated as "as if it were a non-overloaded function call", it also fails, which leaves no valid overloads. Mypy reports the error as such. Perhaps the wording in mypy's error is leading you to believe that it's using some different process.

I purposely separated steps 1 and 2 to try to simplify both the description of the algorithm and the potential implementation. It's much cheaper to filter based on arity. Evaluating types of argument expressions is orders of magnitude more expensive — especially if they need to be re-evaluated for ever overload signature due to bidirectional type inference. It therefore doesn't make sense for a type checker to combine steps 1 and 2 if it cares about performance.

Copy link
Member

@carljm carljm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the only way to observe the difference in the algorithm in the semantics, is by observing the selected return type (and secondarily, the emitted diagnostics). Mypy selects a return type of Any, and does not emit argument type errors relative to any one overload. Both of these behaviors suggest that it is simply checking all overloads and concluding that no overload matches (consistent with the single-step algorithm), not picking the sole arity-matching overload as the winner and then checking the call normally relative to that overload, as described here.

Pyright and pyre, in contrast, both infer the return type of the overload whose arity matches, not Any, and both emit errors about mismatched argument types, relative to only that overload. This behavior is consistent with the algorithm described here.

I agree with you about performance, but I think the spec should concern itself with the simplest possible description of the semantics, not with performance. I don't agree that separating steps 1 and 2 makes the description simpler ("check each call in order and discard any that result in errors" is a simple single step and doesn't require any separate discussion of arity vs types), and I'm not sure whether the return type difference above should be specified.

It's possible that internally mypy is doing a separate first arity pass, for performance reasons. But its observable behavior in the return type selected is as if it does not; it doesn't match what is specified here.

The tests I've written currently specify the return type behavior of pyright and pyre in this case, and mark mypy as out of compliance because it says the return type is Any. If instead the linked behaviors of mypy and pyright should both be permitted by the conformance suite, I'm fine with that outcome, but in that case I don't believe the conformance suite can observe any distinction between the algorithm as specified vs an implementation that combines step 1 and step 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't draw any conclusions from the type that is evaluated for an errant call expression. The spec is (and should be) silent on what type should be evaluated in the face of an error. That's true of all expression types that result in errors of various types (syntactical and semantic). Once an error is reported, all bets are off in terms of what type you should expect for type evaluation. Mypy's evaluation of Any is reasonable for a type checker not associated with a language server. Pyright's behavior (where it attempts to "guess" the most likely intended return type) is more appropriate for a language server. I would object if someone were to propose that the spec should specify the evaluated type in the face of an error.

I think the description of the algorithm is simpler and clearer if steps 1 and 2 are separated. It sounds like you see it differently. I guess it's subjective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. All type checkers, including mypy, currently always evaluate a call expression to the annotated return type of the called function, even if there are argument type or arity errors in the call. So it still seems that mypy is not really following the algorithm as described here, in that it is not evaluating the sole arity-matching overload as if that were the signature of the function.

I guess there is some ambiguity in what the overload spec means here, because we are attempting to specify overloads without having specified regular call evaluation yet. But if we assume that call evaluation would be specified to not mandate any return type when the call errors, then I think that there is no observable semantic difference between the one-step and two-step version of this part of the algorithm, so it's just a matter of which description is easier to understand. Since that's subjective, and there is some value in describing a more performant algorithm, I'm good with the current text.

I'll update the conformance suite to not expect any particular return type in case no overload fully matches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the conformance suite to not expect any particular return type in case no overload fully matches.

Done.

Comment on lines +306 to +315
Step 4: If the argument list is compatible with two or more overloads,
determine whether one or more of the overloads has a variadic parameter
(either ``*args`` or ``**kwargs``) that maps to a corresponding argument
that supplies an indeterminate number of positional or keyword arguments.
If so, eliminate overloads that do not have a variadic parameter.

- If this results in only one remaining candidate overload, it is
the winning match. Evaluate it as if it were a non-overloaded function
call and stop.
- If two or more candidate overloads remain, proceed to step 5.
Copy link
Member

@carljm carljm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems to assume that unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error, even though this is unsound.

This does appear to describe the actual behavior of pyright (even in strict mode), mypy, and pyre.

As far as I can find, this behavior is not specified anywhere.

Should we acknowledge in the text that this step assumes an unsound call binding strategy, which is not (yet?) specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems to assume that unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error, even though this is unsound.

That is not what was intended. What I was trying to say is that *args and **kwargs parameters (which are of indeterminate length) should be matched against unpacked sequences or mappings of indeterminate length.

The text does not say anything about matching a fixed numbers of parameters. It talks only about *args or **kwargs.

I guess that *args and **kwargs parameters can be of determinate length if they use an unpacked tuple or unpacked TypedDict, respectively, so maybe the text needs to be clearer to indicate that these cases are exempt from this rule.

Copy link
Member

@carljm carljm Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to say is that *args and **kwargs parameters (which are of indeterminate length) should be matched against unpacked sequences or mappings of indeterminate length.

Yes, this is also what I understood.

The reason I took this to imply that "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", is that if the latter is not the case, then I don't see how step 4 could ever eliminate an overload that wasn't already eliminated by step 2. The overloads that could pass step 2 and then be eliminated by step 4, would be overloads where an unpacked argument of indeterminate length successfully ("without error", in order to pass step 2) matched against an overload without a corresponding variadic parameter.

So without the implication I mentioned, step 4 would be redundant. (And this is immediately relevant to the conformance suite, because the only tests I could write for step 4 have to rely on "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", which means we are now at least implicitly specifying that. Which may be the right thing to do, since type checkers already appear to agree on it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this thread is a second case of "overload handling has to be built on top of some specification for binding a call to a single (non-overloaded) signature, and we don't have an explicit specification for that yet, which makes some things less clear in the overload spec"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is immediately relevant to the conformance suite, because the only tests I could write for step 4 have to rely on "unpacked arguments of indeterminate length should be matched to fixed numbers of parameters without error", which means we are now at least implicitly specifying that.

Oh, actually, this is not true. The test I wrote would pass even if we required strict handling of indeterminate-length unpacked arguments. It would just pass because of step 2 instead, and step 4 would be redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So without the implication I mentioned, step 4 would be redundant.

I see what you mean. Yes, both mypy and pyright allow an unpacked argument of indeterminate (unknown) length to match against a fixed number of parameters. That can produce false negatives, but flagging this as an error will produce false positives. Since it's a common use case in python, so I think it would be pretty annoying to emit an error for this case. Neither mypy nor pyright do.

from typing import Literal, overload

x1 = [1]
x4 = [1, 2, 3, 4]

def func1(p1: int, /) -> Literal[1]: ...

reveal_type(func1(*x1))  # No type checker error, no runtime error
reveal_type(func1(*x4))  # No type checker error, runtime error

@overload
def func2(p1: int, /) -> Literal[1]: ...
@overload
def func2(p1: int, p2: int, /, *args: int) -> Literal[3]: ...
def func2(*args: int) -> int: ...


reveal_type(func2(*x4))  # Literal[1] (pyright), Literal[3] (mypy)
reveal_type(func2(1, 2, *x4))  # Literal[3]

And yes, you're correct that without this behavior specified, it's still possible for two type checkers to be conformant with the existing spec but still differ in how they interpret overloaded calls.

Copy link
Member

@carljm carljm Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems like the lenient behavior shared by all type checkers is the one we'd need to specify here. So I don't really think any change to the spec text is needed.

Thanks for the example! It showed that my test for step 4 wasn't adequate, as it wasn't catching the fact that pyright doesn't seem to prefer the variadic (second) overload of func2 for the call func2(*x4), where I believe this spec says that it should? I've updated the test to catch this.

for all remaining overloads are :term:<equivalent>, proceed to step 6.

If the return types are not equivalent, overload matching is ambiguous. In
this case, assume a return type of ``Any`` and stop.
Copy link
Member

@carljm carljm Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a correctness requirement for the return type here to be assumed to be Any? It seems to me that it would also be valid for a type-checker to use the union of all the ambiguous matching overloads. I would prefer for the specification not to prevent that option. (No type checker currently does this.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing stubs (including typeshed, numpy, pandas, and others) assume that the result will be Any in this case, so I don't think this is something we can change at this point. An earlier version of pyright generated a union, and it resulted in many false positive errors and lots of unhappy users. I think it's important for the spec to specify Any here so stub authors can rely on the behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, that's useful info, thank you. If you happen to know of any old pyright issues where unhappy users surfaced these problems with the union behavior, I would be curious to take a look at some real world cases relying on this.

@carljm
Copy link
Member

carljm commented Jan 11, 2025

I believe I've completed the test suite, with reasonably good coverage of everything specified as a "should". I intentionally avoided adding tests either way for behaviors specified as a "may".

I also added the capability to have stub test files in the conformance suite, and added overloads_definitions_stub.pyi, since the rules for valid overload definition are significantly different in a stub file.

I aimed to write tests that reflect the specification as it currently exists in this PR, to help illuminate where type checkers currently do and don't conform to this spec. I commented inline on some points where I wonder if we should adjust the spec.

@erictraut
Copy link
Collaborator Author

@carljm, thanks for doing this! I'll try to find time next week to review your test code and update the draft spec if your test uncovered any areas of ambiguity or lack of clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.