-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the codespell pre-commit hook #403
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
nD | ||
CACE | ||
compliers | ||
complier |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Copyright 2024 The PyMC Labs Developers | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
""" | ||
This is a simple script that converts the jupyter notebooks into markdown | ||
for easier (and cleaner) parsing for the codespell check. Whitelisted words | ||
are maintained within this directory in the `codespeel-whitelist.txt`. For | ||
more information on this pre-commit hook please visit the github homepage | ||
for the project: https://github.com/codespell-project/codespell. | ||
""" | ||
|
||
import argparse | ||
import os | ||
from glob import glob | ||
|
||
import nbformat | ||
from nbconvert import MarkdownExporter | ||
|
||
|
||
def notebook_to_markdown(pattern: str, output_dir: str) -> None: | ||
""" | ||
Utility to convert jupyter notebook to markdown files. | ||
|
||
:param pattern: | ||
str that is a glob appropriate pattern to search | ||
:param output_dir: | ||
str directory to save the markdown files to | ||
""" | ||
for f_name in glob(pattern, recursive=True): | ||
with open(f_name, "r", encoding="utf-8") as f: | ||
nb = nbformat.read(f, as_version=4) | ||
|
||
markdown_exporter = MarkdownExporter() | ||
(body, _) = markdown_exporter.from_notebook_node(nb) | ||
|
||
os.makedirs(output_dir, exist_ok=True) | ||
|
||
output_file = os.path.join( | ||
output_dir, os.path.splitext(os.path.basename(f_name))[0] + ".md" | ||
) | ||
|
||
with open(output_file, "w", encoding="utf-8") as f: | ||
f.write(body) | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument( | ||
"-p", | ||
"--pattern", | ||
help="the glob appropriate pattern to search for jupyter notebooks", | ||
default="docs/**/*.ipynb", | ||
) | ||
parser.add_argument( | ||
"-t", | ||
"--tempdir", | ||
help="temporary directory to save the converted notebooks", | ||
default="tmp_markdown", | ||
) | ||
args = parser.parse_args() | ||
notebook_to_markdown(args.pattern, args.tempdir) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ repos: | |
exclude: &exclude_pattern 'iv_weak_instruments.ipynb' | ||
args: ["--maxkb=1500"] | ||
- repo: https://github.com/astral-sh/ruff-pre-commit | ||
rev: v0.6.2 | ||
rev: v0.6.3 | ||
hooks: | ||
# Run the linter | ||
- id: ruff | ||
|
@@ -41,3 +41,39 @@ repos: | |
# needed to make excludes in pyproject.toml work | ||
# see here https://github.com/econchick/interrogate/issues/60#issuecomment-735436566 | ||
pass_filenames: false | ||
- repo: local | ||
hooks: | ||
- id: convert-notebooks | ||
name: Convert Notebooks to Markdown | ||
entry: python ./.codespell/notebook_to_markdown.py | ||
language: python | ||
pass_filenames: false | ||
always_run: false | ||
additional_dependencies: ["nbconvert", "nbformat"] | ||
args: ["--tempdir", "tmp_markdown"] | ||
- repo: https://github.com/codespell-project/codespell | ||
rev: v2.3.0 | ||
hooks: | ||
- id: codespell | ||
args: [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noticing that codespell can be configured by pyproject.toml. I think it's worth standardizing on I'm mostly thinking of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea that's an easy fix |
||
"-S", | ||
"*.csv", | ||
"-S", | ||
"pyproject.toml", | ||
"-S", | ||
"*.svg", | ||
"-S", | ||
"*.ipynb", | ||
"--ignore-words=./.codespell/codespell-whitelist.txt", | ||
] | ||
additional_dependencies: | ||
# Support pyproject.toml configuration | ||
- tomli | ||
- repo: local | ||
hooks: | ||
- id: remove-temp-directory-notebooks | ||
name: Remove temporary directory for codespell | ||
entry: bash -c 'rm -rf tmp_markdown && exit 0' | ||
language: system | ||
always_run: true | ||
pass_filenames: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, design choices! (i.e. no right or wrong here, but would love to double-check ideas)
May I ask, codespell appears to behave more like a software test than a linter with the setup (
convert-notebooks
) and teardown (remove-temp-directory-notebooks
), so would it be better for this to be implemented inside CI/CD instead of in pre-commit hooks?Not suggesting that we do so, but I just wanted to see whether there's a strong(er) rationale for leaving it in a pre-commit hook than within a GitHub action independently. Is the intent for it to be run locally? Also, might there be a more compact way of configuring this?
To be clear, definitely not suggesting that we move away from what's implemented. Just asking these questions to make sure the rationale is strong.
The only ask I'd have here is to document this design choice in the documentation directory. (My criteria for documentation is that if a topic has been asked and the answers are not in the docs already, then it should be documented.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably more of a philosophical question for you (@ericmjl ) and @drbenvincent
initially this was just checking spelling outside of the jupyter notebooks, which to me definitely feels like a
pre-commit
check, but now the check is including the notebook to markdown conversion to find spelling mistakes in these notebooks. maybe this highlights a bit of scope creep for this one PR / issue because purely as apre-commit
check I think it makes sense to just look at the.py
and.md
files and then the notebook spelling check is more of a CI checkso, it's up to you two but i'm happy to keep plugging away at this PR with the updated doc changes and rationale changes but maybe it makes sense to prune back this PR to just the base
codespell
checks then cherry-pick out the commits to a new PR for an actual CI check for jupyter notebooks since those should be more static and don't need to be checked on every commit -- thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @westford14. Yeah, I was thinking the same thing - happy for you to do this.