Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸ—οΈ merge types-scipy-sparse into scipy-stubs #129

Closed
jorenham opened this issue Oct 21, 2024 · 12 comments
Closed

πŸ—οΈ merge types-scipy-sparse into scipy-stubs #129

jorenham opened this issue Oct 21, 2024 · 12 comments
Labels
meta: codegen Automatically generated code scipy.sparse

Comments

@jorenham
Copy link
Collaborator

jorenham commented Oct 21, 2024

@BarakKatzir has been developing the types-scipy-sparse stub package for scipy.sparse. A large portion appears to more complete that the scipy.sparse stubbs in scipy-stubs.

I have talked with @BarakKatzir about joining forces, and we both agreed that gradually merging types-scipy-sparse into scipy-stubs is not only a good idea, but also a feasible one.

@BarakKatzir, how about we could use this issue as central planning on how we can go about this? So e.g. how we can "chop it up" into PR's, and in which order we do so.

related:

@BarakKatzir
Copy link

Good idea. Currently we talked about the first PR including:
1.Moving a single sparse array type (csc / csc etc.) from types-scipy-sparse package to scipy-stubs. Add inheritance from _spbase class (since in my package _spbase isn't type-annotated and you mentioned that despite being private it is used by users).
2. I probably have to add generics to _spbase as well, since there are some methods which need it (e.g., real). This might also be in the form of generated code.
2. Regarding the code generation - I'll start by copying my tool (just for the relevant sparse type in the PR). It's a rough tool, but it's a good starting point IMO.

After the first PR we'll have a better idea of issues in the merger and we can plan ahead.

@jorenham
Copy link
Collaborator Author

jorenham commented Oct 21, 2024

  1. I probably have to add generics to _spbase as well, since there are some methods which need it (e.g., real). This might also be in the form of generated code.

Do you think that it would help if we defer the Generic bits, and re-introduce them once we have the fundamentals in place? You used __ceil__ as example to show how generics lead to unavoidable code duplication. From that perspective, omitting the generics would be like compressing a file before you transfer it, and decompressing it when all files are received.


  1. Regarding the code generation - I'll start by copying my tool (just for the relevant sparse type in the PR). It's a rough tool, but it's a good starting point IMO.

The codegen/ directory seems like the right place for that. But I'm not really sure on what the best approach for integrating this in the CI, and perhaps also pre-commit, would be.
The current codegen stuff is mostly meant for one-time only use, and is therefore not part of the CI.

But if we decide to defer the generics, do you still think it would be needed?
Because as I mentioned in #47, I plan to integrate unpy in scipy-stubs soon, which also does code-generation. So I'm sure you can imagine that having two codegens together could become a bit messy, if not done right.

I think that the ideal scenario would be to achieve the same DRY result as your template-based approach through unpy, because unpy code is (and probably always will be) syntactically valid Python. This means that IDE's still understand it, and all of our linters and type-checkers can also be used in the same way.


After the first PR we'll have a better idea of issues in the merger and we can plan ahead.

Yea that makes sense. Sometimes I forget that premature optimization is the root of all evil.

@BarakKatzir
Copy link

Okay, you're right. Since this is a big change we should aim at doing it as small as possible, but still in a way we can see how it will scale up. How about I add generic to the child class but only use it on a single method that isn't in _spbase? that way the changes will be contained to a single pyi file?

I don't think unpy is the correct code generation tool for the job, but maybe I'm misunderstanding it. The code generation here is very specific to scipy-stubs, and is not directly related to downgrading stubs to previous python versions. Adding overrides that depend on generics and the specific method at hand doesn't sound like something unpy is intended to do, no?

I'll put a suggestion here and let me know what you think:
We can have some meta stub file _csr_meta.pyi that contains a compressed type annotations. This file will not be distributed in scipy-stubs and sit elsewhere in the repo, maybe under codegen/compressed_stubs/. We can write a tool that decompresses it to _csr.pyi which will be distributed under scipy-stubs. Regarding CI we can add a check that compares the generated _csr.pyi with the one currently present in the commit.

In this approach, having the _csr.pyi in the repo is redundant and can be annoying since we have this _csr.pyi as a source file which we don't actually want to edit by hand, but this keeps the build simple.
Later on, when we're ready, we can apply unpy to the generated _csr.pyi stub and downgrade it to older python versions.

@jorenham
Copy link
Collaborator Author

Okay, you're right. Since this is a big change we should aim at doing it as small as possible, but still in a way we can see how it will scale up. How about I add generic to the child class but only use it on a single method that isn't in _spbase? that way the changes will be contained to a single pyi file?

Yea that makes sense πŸ‘ŒπŸ»

I don't think unpy is the correct code generation tool for the job, but maybe I'm misunderstanding it. The code generation here is very specific to scipy-stubs, and is not directly related to downgrading stubs to previous python versions. Adding overrides that depend on generics and the specific method at hand doesn't sound like something unpy is intended to do, no?

You're right in saying that at the moment unpy won't be able to help much with this. But I can imagine that once jorenham/unpy#97 and jorenham/unpy#98 are implemented, that we'd be able to achieve similar DRY results as by using a template-based approach in the case of methods.

I'll put a suggestion here and let me know what you think: We can have some meta stub file _csr_meta.pyi that contains a compressed type annotations. This file will not be distributed in scipy-stubs and sit elsewhere in the repo, maybe under codegen/compressed_stubs/. We can write a tool that decompresses it to _csr.pyi which will be distributed under scipy-stubs. Regarding CI we can add a check that compares the generated _csr.pyi with the one currently present in the commit.

NumPy does something similar with template-based code generation. They use a {module}.py.in naming scheme for templates that generate {module}.py. See e.g. numpy/__config__.py.in. This way IDE's, linters, and type-checkers won't try to parse the template file, which would inevitably cause them to report errors. For the most part, keeping the template in a separate directory could avoid this, you'd still have the IDE that wouldn't be able to parse it (e.g. VSCode only considers filename extensions when selecting a file type I believe).

As I also mentioned in jorenham/unpy#61, I think that for any code-generation approach it is important to make it clear to the user that the code they're looking at is fully generated. Otherwise there's the risk of them submitting PR's that (only) change this generated code, instead of its source template.

I also like numpy's approach of having the template in the same place as the source, so that it's easier to find it if you're looking for it, but forgot it's template-based (which, given my scatterbrain, is bound to happen to me, lol).

In this approach, having the _csr.pyi in the repo is redundant and can be annoying since we have this _csr.pyi as a source file which we don't actually want to edit by hand, but this keeps the build simple. Later on, when we're ready, we can apply unpy to the generated _csr.pyi stub and downgrade it to older python versions.

That seems like a good plan actually; kickstarting it in a dry manner with a (no offense) "quick-and-dirty" approach, and then gradually introducing unpy once we're ready to do so.

@jorenham
Copy link
Collaborator Author

jorenham commented Oct 22, 2024

I have some experience with jinja as a templating language. It has great python-like syntax that you can use to create macros and such, and it has very decent IDE integration, which includes support for the the jinja + python combo. So perhaps we could use that? Syntactically, it's actually rather similar to what you've already been using.

See https://github.com/koxudaxi/datamodel-code-generator/ for an example of how it can be used in a python codegen context.

@BarakKatzir
Copy link

Wow, I didn't know of jinja. Looks like adapting my tool to a more comprehensive tool will not be such a hassle. The IDE integration is a great feature!

I'll start working on a PR and we can continue refining the framework over there.

@jorenham jorenham added the meta: codegen Automatically generated code label Oct 24, 2024
@jorenham
Copy link
Collaborator Author

jorenham commented Nov 4, 2024

How's it going @BarakKatzir ?

@BarakKatzir
Copy link

Hi, @jorenham. I'm progressing a bit slowly due to deadlines at work (and a family vacation and a sick kid).
It took a bit of trial and error to learn how to work with poethepoet and jinja2, but I think I got the basics of them.
Also, I'm trying not to downgrade some improvements that scipy-stubs has over types-scipy-sparse such as using @override - but maybe I should just open a PR and we can refine it over there.
I'll hopefully open a PR in a couple of days.

@jorenham
Copy link
Collaborator Author

jorenham commented Nov 7, 2024

I'll hopefully open a PR in a couple of days.

πŸŽ‰

Also, I'm trying not to downgrade some improvements that scipy-stubs has over types-scipy-sparse such as using @OverRide - but maybe I should just open a PR and we can refine it over there.

I believe that basedpyright is configured to require @override at the moment.

If there are a lot of missing @overloads, then maybe we could also consider writing a libcst codemod for it. But I suspect it'll probably be faster to manually do this when there are <100 or so cases.

@jorenham
Copy link
Collaborator Author

jorenham commented Nov 7, 2024

It took a bit of trial and error to learn how to work with poethepoet and jinja2, but I think I got the basics of them.

Ah yea there's quite a bit of tooling involved. But in the long run those should all save time. If there's anything I can help, then don't hesitate to ask :)

@jorenham jorenham unpinned this issue Nov 24, 2024
@jorenham jorenham changed the title The great types-scipy-sparse merger πŸŽ‰ πŸ—οΈ merge types-scipy-sparse into scipy-stubs Dec 6, 2024
@jorenham
Copy link
Collaborator Author

jorenham commented Dec 8, 2024

@BarakKatzir Any updates on this @BarakKatzir?

@jorenham
Copy link
Collaborator Author

superseded by #307, #309, #310, #311, #313, #314, #315, #315, #316, #317, #318, #319, and #320

@jorenham jorenham closed this as not planned Won't fix, can't repro, duplicate, stale Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: codegen Automatically generated code scipy.sparse
Projects
None yet
Development

No branches or pull requests

2 participants