Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement Vale #3065

Closed
wants to merge 5 commits into from
Closed

Conversation

ngken0995
Copy link
Collaborator

Fixes

Potentially Fixes #3566 by @dhruvkb

Description

Add pre-commit hook for vale, create .vale.ini to determine what style should be implemented. Codespell should ignore editorial style guide.

I believe there is a bug in my code or I'm still missing some information while running pre-commit because vale will state there is no files to check. vale.................................................(no files to check)Skipped. Once I run just lint, all the error will be populated about the documentation folder.

Example:

 documentation/frontend/guides/test.md
 3:48  error  Use 'it's' instead of 'it is'.  Microsoft.Contractions 
                         

Testing Instructions

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • [N/A] I added or updated tests for the changes I made (if applicable).
  • [N/A] I added or updated documentation (if applicable).
  • [N/A] I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@ngken0995 ngken0995 requested a review from a team as a code owner September 26, 2023 16:10
@openverse-bot openverse-bot added 🟩 priority: low Low priority and doesn't need to be rushed ✨ goal: improvement Improvement to an existing user-facing feature 📄 aspect: text Concerns the textual material in the repository 🧱 stack: documentation Related to Sphinx documentation labels Sep 26, 2023
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works well! I checked out the branch and ran just lint.

✖ 74 errors, 258 warnings and 315 suggestions in 20 files.

That is a lot of violations, especially considering it's only applied to the documentation/ folder for now. I think the following two changes are needed so that the PR passes CI and can be merged.

  • Some of the most violated rules need to be disabled, if they're not very serious. For example, Microsoft.ComplexWords and Microsoft.We don't really apply in our case.
  • Violations of more serious rules like Microsoft.Accessibility should be addressed and resolved.

@ngken0995 would you like to work on these changes? If yes, that's great, thank you! If not, we can keep it open and make it the base branch for subsequent PRs that adddress these changes and we'll merge it once it passes CI.

In either case, I'll convert it to WIP/draft for now.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
.vale.ini Outdated Show resolved Hide resolved
@dhruvkb dhruvkb marked this pull request as draft October 4, 2023 05:45
@ngken0995
Copy link
Collaborator Author

@dhruvkb I'll work on it. I'll change some of the violation rule to suggestions and edit Microsoft.Accessibility statements.

@ngken0995 ngken0995 requested a review from dhruvkb October 5, 2023 15:50
@ngken0995
Copy link
Collaborator Author

@dhruvkb I have changed all style with error level to suggestion level except Microsoft.Accessibility. Let me know which other style guide should be set to error level.

@dhruvkb
Copy link
Member

dhruvkb commented Oct 5, 2023

That's a massive effort @ngken0995, thank you so much! I'll try to review this asap. Since it's a pretty big PR, it can take a while so feel free to work on something else in the meantime.

@dhruvkb dhruvkb marked this pull request as ready for review October 6, 2023 11:23
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me except for some very aggressive rules related to words like 'disabled' and 'normal', which may be acceptable as long as they're not referring to people.

@WordPress/openverse-maintainers should we consider adding those words to an allowlist? Or should we update our documentation to not use these words as all and opt for alternatives (like 'deactivated' and 'standard' as this PR does)?

Once the service has determined the new task to be healthy, it will redirect all
Once the service has determined the new task to be active, it will redirect all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here 'healthy' is the technical term, cannot change to 'active'. Maybe we can add an exception for this?

- affected by
- an epileptic
- crippled
- disabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this in the list adds a lot of changes when it refers to a toggle or a feature and not a person.

- maimed
- missing a limb
- mute
- normal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this in the list is a bit excessive. Afaik 'normal' has no negative connotations in regular speech.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with other things, my only perception of "normal" as a negative or derogatory thing is in reference to people (culture, ability, etc). The behaviour of an application is something I personally feel comfortable classifying as "normal" and "abnormal". However, that's my personal sense of it, I've asked another person and they also felt that so long as it wasn't in reference to a person it didn't seem bad, and avoiding it would complicate the language (with the adverse effect of accidentally making the written text less accessible, from a general linguistic perspective).

"Normal" also doesn't immediately evoke the same kinds of historical violence when not used to describe a person. Whereas other terms like "master/slave", "nuke", etc, directly and unambiguously evoke tremendous violence, regardless of the context. It makes perfect sense to avoid those, especially because there are actually clearer terms to use. Same with blacklist/whitelist (deny/allow is much clearer).

The rule message at the top of this file is:

message: "Don't use language (such as '%s') that defines people by their disability."

We have only a handful of documents that this would ever be relevant to (where we talk about people and their behaviour), namely the decision-making process and the code of conduct. I get the sense that Microsoft created this particular rule for non-technical documentation, like management or HR documents? There are probably some software domains this is relevant to, not sure Openverse fits into that domain. Maybe HR software, education software, hospital software, etc?

That's just my 2-cents. I don't mind changing the words we use, especially because "inactive" and "deactivated" are just as clear as "disabled" if used in the right contexts. "Standard" feels like less of an unambiguous alternative to "normal", especially because in software it evokes standards, but doesn't feel sufficiently different for me to object to changing it if someone feels strongly about it not being used ever. Though my sense of the disability activism space is that no one would care about using "normal" to talk about the behaviour of a computer program. I also have a hard time imagining that a derogatory/negative usage of the word would get past our code review process, considering the concern and care Openverse maintainers have historically had for a wide range of other issues.

So anyway, I agree that we could remove this rule altogether. Its description of itself shows that it's not really for software documentation. On the other hand, if someone feels strongly that we should add it, fine by me as well. Maybe we can remove it for now to minimise the changes in this PR and get input from other maintainers afterwards, and make appropriate adjustments as needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, of course, the other option is to leave the rule, but remove the words you've noted are common in software without a negative connotation: normal, disabled, healthy, unhealthy. Can't hurt to have an automatic check for the other words here.

@ngken0995
Copy link
Collaborator Author

@WordPress/openverse-maintainers Microsoft isn't the only style guide. Here is a link for more styles: https://vale.sh/types/style/

@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Oct 24, 2023

@ngken0995 Can we try alex and proselint? We have another issue to use alex here, so that would allow us to close that one here as well. It has, I think, much better automatic suggestions for the words that will make it easier to transition our prose. It does include the terms disabled, but has much broader suggested alternatives that, at least to me, make it easier to find phrases that are indeed clearer. Would that be okay? Thanks very much for your patience so far on this issue. We've been dealing with some complex internal issues with our production services alongside folks being away from directly working on Openverse here and there, and it's been hard to keep up with reviews for all of us recently.

@ngken0995
Copy link
Collaborator Author

@ngken0995 Can we try alex and proselint? We have another issue to use alex here, so that would allow us to close that one here as well. It has, I think, much better automatic suggestions for the words that will make it easier to transition our prose. It does include the terms disabled, but has much broader suggested alternatives that, at least to me, make it easier to find phrases that are indeed clearer. Would that be okay? Thanks very much for your patience so far on this issue. We've been dealing with some complex internal issues with our production services alongside folks being away from directly working on Openverse here and there, and it's been hard to keep up with reviews for all of us recently.

@sarayourfriend I have removed Microsoft and write-good and add alex and proselint. I have edit all file with level: error to level: suggestion. Let me know which file should be set to error or warning then I can work on editing the documentations.

@sarayourfriend
Copy link
Collaborator

Thanks for the update. I'm reading through these lists to give feedback on what to remove, upgrade, etc, and I'm struck by two things.

First, the project has gone to great extents to avoid exposing contributors to many of these terms, especially violent slurs. We keep our own list of such terms for filtering indexed works that have potentially sensitive textual content in an entirely separate repository, with only a subset of maintainers asked to monitor and contribute to it. The goal has been to avoid keeping any list like this in our main repository, with the understanding that it's actually inaccessible to make it so easy to accidentally come across them. Some of these lists (rightly!) include really horrific terms, so we'd be going directly against previous diligence to directly include them.

Second, I started to think it would be nice to just switch to alex and prose-lint directly. However, it's clear that Vale is a much better tool, both in that it is much faster than either of those tools and in its deep customisability.

With that in mind, and with the interest of keeping going with Vale to supersede directly using those other linters, I'd like to suggest the following: rather than including our Vale configuration directly in the monorepo, let's move it to https://github.com/WordPress/openverse-sensitive-terms and then build a docker image, published to GHCR, that includes the configuration baked in.

The goal behind this suggestion is to be able to continue to use Vale to its greatest, customise extent, without breaking the previous agreement we made as a project to not include word lists like this directly in our main repository. Further, I think a lot of the term lists that we'll get from this could feed directly into the sensitive terms list we use for identifying potentially sensitive textual content in indexed works. In other words, these lists are actually directly relevant to the other work we do with sensitive terms lists, so it wouldn't be misplaced to put them there.

So @ngken0995, would you be interested in working on an approach based on this idea in the sensitive terms repository? I understand if not, especially given how sporadic feedback has been on this PR (apologies again for that, things have been difficult these past few weeks for the core maintainers!), just let me know either way. The way I see it working is:

  1. Move the styles into a branch on the other repository
  2. Create a Vale.dockerfile that extends https://hub.docker.com/r/jdkato/vale to bake in our styles and .vale.ini, creating an entrypoint script that automatically passes the --config option to the vale command.
  3. Create a GitHub workflow to publish that image to GHCR under openverse-vale (for this you will probably need help from a maintainer, or a maintainer will need to do it themselves, due to permissions issues)
  4. Add a new pre-commit hook in our .pre-commit-config.yaml using the docker_image language and the openverse-vale:latest image

@dhruvkb Can you give your thoughts whether that approach makes sense? I don't feel comfortable at all committing these lists to the monorepo, and this is one solution I can think of to avoid it. If either of y'all have alternative ideas, please say so 🙏

@dhruvkb
Copy link
Member

dhruvkb commented Oct 30, 2023

I agree with @sarayourfriend's idea to utilise the sensitive terms repo for this and use a Docker image to run the linting on content in this repo. It's very well thought out. It is however a bit more complex now, so @ngken0995 please let us know if we can help out in any way e.g. drafting issues or providing suggestions to your drafts.

@ngken0995
Copy link
Collaborator Author

ngken0995 commented Oct 30, 2023

@sarayourfriend Initially, I was using alex as one of the style because there was another PR which suggest using it. Then the really horrific terms showed up and it felt uncomfortable which I decided to switch to Microsoft style. Thank you for the explanation. The proposal looks amazing and intriguing to work.
@dhruvkb Thank you for the suggestion for drafting issues. I wouldn't know where to begin. Please ping me when the issues are created. I would love to work on them.

@openverse-bot
Copy link
Collaborator

Based on the low urgency of this PR, the following reviewers are being gently reminded to review this PR:

@AetherUnbound
@obulat
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 17 day(s) ago. PRs labelled with low urgency are expected to be reviewed within 5 weekday(s)2.

@ngken0995, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@sarayourfriend
Copy link
Collaborator

@ngken0995 Here is the tracking issue in the other repository: WordPress/openverse-sensitive-terms#6

I'll close this PR now in favour of completing the prerequisite work first. Thanks again for your patience and willingness to work through and think about this issue carefully. We really appreciate all your contributions and hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: documentation Related to Sphinx documentation
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate Vale for linting docs according to editorial style guide
4 participants