Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include py.typed file & fix and/or add missing/wrong type hints in stub files #50

Merged
merged 8 commits into from
Aug 13, 2024

Conversation

jonded94
Copy link
Contributor

@jonded94 jonded94 commented Aug 9, 2024

Now that PR #44 is in, to make this package actually work in a mypy context, we'd need to include a py.typed marker file.

Otherwise you'd see this error message:

error: Skipping analyzing "fastwarc.warc": module is installed, but missing library stubs or py.typed marker  [import-untyped]

Including this file makes this package properly PEP 561 compliant.

I'm sorry that this is something I forgot when participating in #44.

@phoerious
Copy link
Member

Did you check whether the file is actually included in the wheel? You might need to add it to the package_data. Also please rebase the branch to the latest develop.

@jonded94
Copy link
Contributor Author

jonded94 commented Aug 9, 2024

Yes, when I checked, it was both in the sdist and the actual wheels. The .pyi files are too btw.

I'll rebase this as soon as I opened another PR, I noticed that unfortunately a bunch of the type hints that were introduced are slightly wrong and/or missing.

@jonded94 jonded94 force-pushed the include-py-typed-marker-file branch from f751ee0 to 04e3438 Compare August 9, 2024 18:24
@jonded94 jonded94 changed the title Include py.typed file to make fastwarc PEP 561 compliant Include py.typed file & fix and/or add missing/wrong type hints in stub files Aug 9, 2024
@jonded94
Copy link
Contributor Author

jonded94 commented Aug 9, 2024

Hey @phoerious ,

unfortunately, adding and/or fixing wrong or missing type hints took longer than initially thought.

Patching a py.typed file into this package and testing what the output mypy was on the test files, specifically tests/fastwarc/test_warc.py, revealed that there were some issues with the current stub files.

The most central change of this PR is that places that accepted purely an fastwarc.IOStream before now accept both a fastwarc.IOStream or a generic typing.BinaryIO.

Current state of type checking of tests:

In general, I have to say that the state of the current tests is really not ideal to debug these kinds of things. It's practically entirely untyped code which creates a lot of noise for mypy which you have to sieve through to find the actual typing mishaps. This is the state of mypy --strict tests/fastwarc/test_warc.py on this branch, i.e. after fixing the issues in the stub files:

Found 112 errors in 1 file (checked 1 source file)

Implementation changes that could be nice for the future:

Another, relatively central issue is that IOStream does not implement a proper typing.IO interface. For this, following methods are missing/and or not correctly implemented:

IOStream.mode(self) -> str:
IOStream.name(self) -> str:
IOStream.closed(self) -> bool:
IOStream.fileno(self) -> int:
IOStream.isatty(self) -> bool:
IOStream.read(self, n: int = -1) -> bytes:  # needs default 'n = -1', meaning read everything
IOStream.readable(self) -> bool:
IOStream.readline(self, limit: int = -1) -> bytes:
IOStream.readlines(self, hint: int = -1) -> List[bytes]:
IOStream.seek(self, offset: int, whence: int = 0) -> int:  # needs 'whence'
IOStream.seekable(self) -> bool:
IOStream.truncate(self, size: int = None) -> int:
IOStream.writeable(self) -> bool:
IOStream.writelines(self, lines: list[bytes]) -> None:

Note that this is about implementing just a method interface, the actual non-abstract classes can raise a io.UnsupportedOperation for example, if that specific operation really can't be supported.

Also, for actually implementing typing.MutableMapping, the WarcHeaderMap would need:

WarcHeaderMap.__delitem__(self, key: str) -> None:

TL;DR

But nevertheless, I think this current state should already fix most issues that users have with this library in the wild. Let me know what you think about it.

@jonded94
Copy link
Contributor Author

jonded94 commented Aug 9, 2024

Slight correction: Since gzip.GzipFile for example does not properly implement typing.BinaryIO (it's missing truncate()), but it definitely is allowed as a possible type here, I went for some generic Readable/WriteableStream Protocols.

All python objects that implement necessary methods are now allowed in especially ArchiveIterator & WarcRecord.write(...).

This maps nicely to what the Cython code actually does low-level, i.e. checking for methods read, write, close, ... on a generic Python object and then wrapping it with PythonIOStreamAdapter.

tests/fastwarc/test_warc.py Outdated Show resolved Hide resolved
@phoerious
Copy link
Member

You can add the missing methods if you want, but they would all raise exceptions, unless you implement all the logic needed. I'm not sure whether this is actually useful. The IOStream classes are not drop-in replacements for their Python equivalents.

the import was missing. How did this work previously?

I added explicit __all__ in a recent commit. Previously, this was a hanger-on of the from fastwarc.warc import *. Just rebase the branch and everything should be fine.

@phoerious
Copy link
Member

I'm still trying to figure out how to make the tokenless Codecov upload work on pull requests from forks. 🤔

@jonded94
Copy link
Contributor Author

I'm not sure whether this is actually useful. The IOStream classes are not drop-in replacements for their Python equivalents.

I don't see that it's that clear that they couldn't do that. Properly implementing some centrally, well-known interfaces is always a plus in my book.

But this was rather about removing that the stub files wrongly assume that these interfaces were properly implemented, leading to mypy errors wherever you use these classes. This is now done and I don't see a strict need for anything further in the short-term future.

I added explicit all in a recent commit. Previously, this was a hanger-on of the from fastwarc.warc import *. Just rebase the branch and everything should be fine.

Ah, nice! Will do.

@phoerious
Copy link
Member

I've updated the workflow file a bit. Please rebase again, so we can see whether the Codecov upload now works.

@jonded94 jonded94 force-pushed the include-py-typed-marker-file branch from 4bcbf72 to 622c166 Compare August 12, 2024 09:02
@jonded94
Copy link
Contributor Author

Rebased on d9f785ac40076260f9eb728a08e85d23b0d1d87f, but still failed? :/

tests/fastwarc/test_warc.py Outdated Show resolved Hide resolved
fastwarc/fastwarc/__init__.py Show resolved Hide resolved
@phoerious
Copy link
Member

Rebased on d9f785ac40076260f9eb728a08e85d23b0d1d87f, but still failed? :/

Damn. I pretty much did everything exactly as in the upstream GitHub action. Unfortunately, I cannot use the action directly, since it requires a newer Glibc version than the one in the manylinux image.

@jonded94 jonded94 force-pushed the include-py-typed-marker-file branch from 622c166 to 7f1e49f Compare August 12, 2024 09:12
@jonded94
Copy link
Contributor Author

Removed the unnecessary hashlib import.

@phoerious phoerious merged commit ea7dceb into chatnoir-eu:develop Aug 13, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants