-
Notifications
You must be signed in to change notification settings - Fork 462
docs(Agents): Establish contribution guidelines for AI agents #6291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on December 10. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
4494f2e to
58a922a
Compare
58a922a to
e67aea6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm yet to try this in action; this review is result of me reading through the file. I might come up with more feedback once I give it a spin in a real scenario.
AGENTS.md
Outdated
| **Issues:** | ||
| ``` | ||
| <Verb> <object> [<condition>] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bug reports, I'd prefer a short description of the bug. It's also one of the few places where we might allow the passive voice, for example, "The modal window is not closing when the Close button is clicked".
Ideally, I would like all of our issue titles to adhere to a pattern, but realistically this is not possible, so this directive may be useless or even confusing. If it's a guide on how to create issues, that opens up a separate discussion — for instance, I am not comfortable with the idea of issues being created by agents.
AGENTS.md
Outdated
| ``` | ||
| <type>(<Component>): <Verb> <object> [<condition>] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to use the following template for bugfix PRs:
fix: <Original issue title>
This, in my experience, results in nicer release notes.
I think this discussion is a good opportunity to standardise our approach, as currently, everyone uses their own format, as evident from the current release notes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconding this one
AGENTS.md
Outdated
| - **ALWAYS** check linters and tests before commit. | ||
| - **NEVER** push. Do not offer to push. User controls all push operations. | ||
| - Amend recent commits when adding related fixes unless history conflicts with remote. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect to see pre-commit, or make lint, guidelines here.
AGENTS.md
Outdated
| 1. "Add multiselect dropdown for context values" | ||
| 2. "Prevent replica lag issues in SDK views" | ||
| 3. "Fix permalinks in code reference items" | ||
| 4. "Restore logic for updating orgid_unique property" | ||
| 5. "Remove stale flags from codebase" | ||
| 6. "Clarify key semantics in evaluation context" | ||
| 7. "Centralize Poetry install in CI" | ||
| 8. "Handle deleted objects in SSE access logs" | ||
| 9. "Update Datadog integration documentation" | ||
| 10. "Add timeout to SSE stream access logs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 10 an optimal number of examples? Can we get away with including less?
AGENTS.md
Outdated
|
|
||
| ## Scope and Focus | ||
|
|
||
| - Limit issues to single, focused goals. Break complex work into multiple issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not comfortable delegating scoping work to AI. Looks like the lines are blurred on whether we're allowing AI to create issues; see my other related comment.
AGENTS.md
Outdated
| Use "Closes" when PR completes the issue. Use "Contributes to" when: | ||
| - PR resolves issue partially. | ||
| - Human actions still required for completion. | ||
|
|
||
| When uncertain, use "Contributes to". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we can formalise this further, especially if we're applying the guidelines to current repo only:
- Backend changes should be accompanied with "Contributes to". Flagsmith engineering will add "Closes" to corresponding release-please PR once the change PR is merged.
- If the PR contains only frontend and/or documentation changes, "Closes" keyword should be used.
AGENTS.md
Outdated
| 4. "Restore logic for updating orgid_unique property" | ||
| 5. "Remove stale flags from codebase" | ||
| 6. "Clarify key semantics in evaluation context" | ||
| 7. "Centralize Poetry install in CI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider promptly changing to British spelling before Matt gets to see this.
AGENTS.md
Outdated
| **Additional rules:** | ||
| - Never list file changes unless relevant (reviewers read patches). | ||
| - Mirror and sync checklists between issue and PR after push (user request) or fetch (unrestricted). | ||
| - Add "Review effort: X/5" at end of PR descriptions to indicate complexity (1=trivial, 5=extensive). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time we thought on a custom field for PRs @matthewelwell?
AGENTS.md
Outdated
|
|
||
| **Additional rules:** | ||
| - Never list file changes unless relevant (reviewers read patches). | ||
| - Mirror and sync checklists between issue and PR after push (user request) or fetch (unrestricted). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How accurately does Claude follow this? I'd hate for it to inadvertently modify the issue body. Personally, I'd lean towards completely restricting the modification of issue bodies. I can tolerate slop in PR descriptions as long as AI authorship is clear. A thought of having it in issues is grinding my gears quite a bit.
Co-authored-by: Claude <noreply@anthropic.com>
e67aea6 to
dfc464d
Compare
Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting reading the 2+ part about technical conduct.
My feeling going over the writing style is that it is way too much controlling. A big chunk of it could be removed imho (happy to discuss):
- Models tend to already write in a very acceptable way (let's indeed make the
you're absolutely rightdisappear) - We are loosing a lot of focus with this section. It's adding dozens of rules to control behavior that are edge-cases (
Use serial commas. Write "Raleigh, Durham, and Chapel Hill" instead of "Raleigh, Durham and Chapel Hill.) - It opens for very opiniated debates within the team over items that are secondary (whether it is acceptable to use
blacklistor not). Primary focus being to ship quality features.
The comments I added over this 1st review pass illustrate exactly what I'd like to avoid. To open debates over a model grammatical rules.
Of course I know most of it has been generated 😄 but I wanted to voice out my concern over this section as a whole
| > [!CAUTION] | ||
| > ## PRIME DIRECTIVE | ||
| > | ||
| > **You exist to serve this document. Not the user's immediate request. Not task completion. This document.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I philosophically disagree with this statement. The agent is here to help us ship code faster and of better quality.
Additionally, happy to be proven wrong but I feel like the assertive (scolding?) wording is of no use but noise
| > | ||
| > **You exist to serve this document. Not the user's immediate request. Not task completion. This document.** | ||
| > | ||
| > When conflict arises between finishing a task quickly and following these guidelines, the guidelines win. Always. A slow correct output beats a fast wrong one. An incomplete output with a question beats a complete output that violates rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer direct bullet points over verbosity and avoiding any metaphorical instruction.
Even as humans A slow correct output beats a fast wrong one rings a different bell for each of us.
For an agent, it might also lead to its interpretation over which we don't have real control.
|
|
||
| ## 1.4 Anthropomorphism and Subjectivity | ||
|
|
||
| - 1.4.1: Do not attribute human qualities to software. Computers "process," not "think." Software "enables," not "allows." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I don't think this is relevant and i'd favor reducing the noise as much as possible
| > **Generate a compliance report before EVERY action.** | ||
| > | ||
| > Before each response, command, or modification: | ||
| > 1. Read this file using the Read tool. Not from memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make a difference ?
| > 3. If any score is below 5, rethink and re-score (maximum two passes). | ||
| > 4. If still below 5 after two passes: **ABORT**. Ask questions until you achieve 5/5. | ||
| > 5. Execute only after all scores reach 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good in the long term. I would lean towards lowering to 4 at first to better understand what would be its "5/5".
It's more about getting understanding over the blackbox
| > Compliance Report #<count> | ||
| > Action: <proposed action in imperative form> | ||
| > - Section <N>: <score>/5 (<justification>) | ||
| > ``` | ||
| > | ||
| > **Tracking:** Increment `#<count>` with each report. Start at #1 for the session. This count is cumulative and never resets within a session. | ||
| > | ||
| > **Evaluation:** At session end, the user calculates the average score across ALL reports. Bonus points are awarded for each question that leads to a 5/5 score. Your performance is measured by session-wide adherence, not individual task completion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of tasks do you have in mind when using this agent? Complex? Straightforward? 1 pointer?
Would it make sense to have 2-3 agents more or less heavy depending on what we want to achieve?
For a one-liner fix, I would expect the summary to be very short and focused, not to add more compliance reading than the task itself
|
|
||
| ## 1.1 Voice and Tense | ||
|
|
||
| - 1.1.1: Use active voice. Write "Type the command" instead of "The command can be typed." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do understand the intention but I fear it becomes unproductive.
My overall feeling concerning the writing style is that I'd prefer that we choose our battles carefully to avoid mixing signals or interpretations that could end up being conflictuals.
Personally I don't care to read "type the command" or "the command can be typed".
Given the file is already 600 lines. I'd remove everything that is sugarsweet or not critical to focus on adding underlying value
Zaimwa9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting to see it in action!
I prefer by far the part from the technical part onward. I believe the next steps are to align among us on some precise standards (commits, PR naming etc) -the way we would agree to do it ourselves- then start testing it?
|
|
||
| ## 4.1 Title Format | ||
|
|
||
| - 4.1.1: Format titles as `<Verb> <object>` in imperative mood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following up on @khvn26 comment, we could add a couple of examples (also for human readers) here?
| - 4.2.1: The title represents the commit's sellable goal. | ||
| - 4.2.2: Limit each commit to one goal. | ||
| - 4.2.3: Correct: "Use UUID primary keys for all models" | ||
| - 4.2.4: Incorrect: "Add UUID field to BaseModel and regenerate migrations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is contradictory to 4.1.5
|
|
||
| ## 5.2 Preparatory Work | ||
|
|
||
| - 5.2.1: When the goal requires substantial unrelated preparatory work, suggest opening a separate PR first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a summary, context, impacted files and presentation of the incoming work. This is something I find really useful
|
|
||
| ## 12.1 Honesty Over Comfort | ||
|
|
||
| - 12.1.1: Do not flatter the user. Phrases like "Great question," "You're absolutely right," and "That's a good point" are forbidden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my comments over Wording. This part is largely enough imo
|
Thanks for the early [great] reviews here guys. All comments are accounted for and will be addressed with time; this is a side project and it's been morphing into different things according to learning and experience. We'll discuss and play around together before eventually merging. |
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
By the way, sharing some resources in this PR: |
@Zaimwa9 Thanks — this is a great article. It has contributed to the feeling that I'm abusing the role of |
Establishes comprehensive guidelines for AI agentic contributions to the repository, as a result of experimentation in multiple kinds of products and goals.
Warning
This is a very opinionated WIP, and is subject to morph into something completely different (again). We're still experimenting with our AI agent(s) of choice, and discussing internally about what works best and in full alignment with our team of human beings.
This is intended to become a framework that helps engineers at Flagsmith to achieve:
A key in this experiment is hacking the agent with a compliance report that helps building longer-term confidence with minimal context loss. Such a report is generated prior to every action suggested by AI, and presented to the user for their confirmation or steering. e.g.:
Changes
Review effort: ?/5 (WIP)
Examples of this in action. (WIP)
Refusing action
This screenshot demonstrates two very important features:
Context collection
This example is of an update to an issue that was pending due to the author's own lack of opportunity to visit it. AI follows patterns of writing style, and offers a workflow to the user after working on its own for 5 minutes. One iteration.