Skip to content

Conversation

@emyller
Copy link
Contributor

@emyller emyller commented Nov 13, 2025

Establishes comprehensive guidelines for AI agentic contributions to the repository, as a result of experimentation in multiple kinds of products and goals.

Warning

This is a very opinionated WIP, and is subject to morph into something completely different (again). We're still experimenting with our AI agent(s) of choice, and discussing internally about what works best and in full alignment with our team of human beings.

This is intended to become a framework that helps engineers at Flagsmith to achieve:

  1. Faster [human] context collection (for better [human] decision making)
  2. Consistency across AI-assisted work
  3. Reducing of review burden
  4. Assisted writing of technical documentation, e.g. issues
  5. Substantial productivity boost with assisted development
  6. Reliable vibe-coding capabilities for building PoCs
  7. Overall better DevEx with the AI agent
  8. Efficient use of time and planet resources by AI

A key in this experiment is hacking the agent with a compliance report that helps building longer-term confidence with minimal context loss. Such a report is generated prior to every action suggested by AI, and presented to the user for their confirmation or steering. e.g.:

Compliance Report #47
Action: Add test for SegmentOverrideDeleteView 404 response
- Writing Style: N/A (no prose output)
- Technical Conduct: 5/5 (read existing test patterns in test_views.py)
- Git Operations: N/A (not staging yet)
- Commit Messages: N/A (not committing yet)
- Issues and Pull Requests: N/A (no issue or PR)
- Push and PR Workflow: N/A (no push)
- PR Reviews: N/A (no PR review)
- Documentation and Comments: N/A (no docstrings added)
- Code Architecture: 5/5 (followed existing test fixtures and assertions)
- Online Research: N/A (no external research)
- Testing: 5/5 (Given/When/Then structure, single behaviour, descriptive name)
- Conversation: N/A (no user-facing prose)

(Command, file change, or any other request for user approval)

Changes

  • Define compliance protocol with scoring system and abort conditions
  • Summarise Red Hat Technical Writing Style Guide rules for writing style, punctuation, and lists
  • Document scope boundaries and technical conduct requirements
  • Specify git operations, commit message, and branch naming conventions
  • Define issue and PR title formats with examples
  • Establish push restrictions and PR creation workflow
  • Establish PR review workflow with explicit user coordination
  • Add documentation and code comment guidelines
  • Define code architecture principles and dependency selection criteria
  • Establish online research requirements with version verification
  • Specify testing structure and coverage requirements
  • Add conversation rules to enforce honesty and prevent flattery

Review effort: ?/5 (WIP)


Examples of this in action. (WIP)

Refusing action

This screenshot demonstrates two very important features:

  1. AI does not assume a question to be a request.
  2. AI refuses to violate guidelines.
image

Context collection

This example is of an update to an issue that was pending due to the author's own lack of opportunity to visit it. AI follows patterns of writing style, and offers a workflow to the user after working on its own for 5 minutes. One iteration.

image image

@emyller emyller requested a review from a team as a code owner November 13, 2025 17:31
@emyller emyller requested review from Zaimwa9 and removed request for a team November 13, 2025 17:31
@cursor
Copy link

cursor bot commented Nov 13, 2025

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on December 10.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@vercel
Copy link

vercel bot commented Nov 13, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
docs Ignored Ignored Preview Dec 3, 2025 10:11pm
flagsmith-frontend-preview Ignored Ignored Preview Dec 3, 2025 10:11pm
flagsmith-frontend-staging Ignored Ignored Preview Dec 3, 2025 10:11pm

@github-actions github-actions bot added the docs Documentation updates label Nov 13, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 13, 2025

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-api-test:pr-6291 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-e2e:pr-6291 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-6291 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-6291 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-6291 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-6291 Finished ✅ Results

@emyller emyller force-pushed the docs/agents/enhance-guidelines branch from 4494f2e to 58a922a Compare November 13, 2025 17:46
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Nov 13, 2025
@emyller emyller changed the title docs(Agents): Enhance agent guidelines with examples and clarity docs(Agents): Add agent guidelines with examples and clarity Nov 13, 2025
@emyller emyller force-pushed the docs/agents/enhance-guidelines branch from 58a922a to e67aea6 Compare November 13, 2025 18:10
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Nov 13, 2025
khvn26
khvn26 previously approved these changes Nov 24, 2025
Copy link
Member

@khvn26 khvn26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet to try this in action; this review is result of me reading through the file. I might come up with more feedback once I give it a spin in a real scenario.

AGENTS.md Outdated
Comment on lines 96 to 99
**Issues:**
```
<Verb> <object> [<condition>]
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bug reports, I'd prefer a short description of the bug. It's also one of the few places where we might allow the passive voice, for example, "The modal window is not closing when the Close button is clicked".

Ideally, I would like all of our issue titles to adhere to a pattern, but realistically this is not possible, so this directive may be useless or even confusing. If it's a guide on how to create issues, that opens up a separate discussion — for instance, I am not comfortable with the idea of issues being created by agents.

AGENTS.md Outdated
Comment on lines 102 to 104
```
<type>(<Component>): <Verb> <object> [<condition>]
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use the following template for bugfix PRs:

fix: <Original issue title>

This, in my experience, results in nicer release notes.

I think this discussion is a good opportunity to standardise our approach, as currently, everyone uses their own format, as evident from the current release notes:

Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconding this one

AGENTS.md Outdated
- **ALWAYS** check linters and tests before commit.
- **NEVER** push. Do not offer to push. User controls all push operations.
- Amend recent commits when adding related fixes unless history conflicts with remote.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to see pre-commit, or make lint, guidelines here.

AGENTS.md Outdated
Comment on lines 48 to 57
1. "Add multiselect dropdown for context values"
2. "Prevent replica lag issues in SDK views"
3. "Fix permalinks in code reference items"
4. "Restore logic for updating orgid_unique property"
5. "Remove stale flags from codebase"
6. "Clarify key semantics in evaluation context"
7. "Centralize Poetry install in CI"
8. "Handle deleted objects in SSE access logs"
9. "Update Datadog integration documentation"
10. "Add timeout to SSE stream access logs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 10 an optimal number of examples? Can we get away with including less?

AGENTS.md Outdated

## Scope and Focus

- Limit issues to single, focused goals. Break complex work into multiple issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable delegating scoping work to AI. Looks like the lines are blurred on whether we're allowing AI to create issues; see my other related comment.

AGENTS.md Outdated
Comment on lines 159 to 163
Use "Closes" when PR completes the issue. Use "Contributes to" when:
- PR resolves issue partially.
- Human actions still required for completion.

When uncertain, use "Contributes to".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can formalise this further, especially if we're applying the guidelines to current repo only:

  1. Backend changes should be accompanied with "Contributes to". Flagsmith engineering will add "Closes" to corresponding release-please PR once the change PR is merged.
  2. If the PR contains only frontend and/or documentation changes, "Closes" keyword should be used.

AGENTS.md Outdated
4. "Restore logic for updating orgid_unique property"
5. "Remove stale flags from codebase"
6. "Clarify key semantics in evaluation context"
7. "Centralize Poetry install in CI"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider promptly changing to British spelling before Matt gets to see this.

AGENTS.md Outdated
**Additional rules:**
- Never list file changes unless relevant (reviewers read patches).
- Mirror and sync checklists between issue and PR after push (user request) or fetch (unrestricted).
- Add "Review effort: X/5" at end of PR descriptions to indicate complexity (1=trivial, 5=extensive).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time we thought on a custom field for PRs @matthewelwell?

AGENTS.md Outdated

**Additional rules:**
- Never list file changes unless relevant (reviewers read patches).
- Mirror and sync checklists between issue and PR after push (user request) or fetch (unrestricted).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How accurately does Claude follow this? I'd hate for it to inadvertently modify the issue body. Personally, I'd lean towards completely restricting the modification of issue bodies. I can tolerate slop in PR descriptions as long as AI authorship is clear. A thought of having it in issues is grinding my gears quite a bit.

@khvn26 khvn26 dismissed their stale review November 24, 2025 19:42

Didn't mean to approve this yet.

emyller and others added 2 commits November 28, 2025 11:04
@emyller emyller force-pushed the docs/agents/enhance-guidelines branch from e67aea6 to dfc464d Compare November 28, 2025 14:05
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Nov 28, 2025
Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Nov 28, 2025
@emyller emyller marked this pull request as draft December 1, 2025 16:33
Copy link
Contributor

@Zaimwa9 Zaimwa9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting reading the 2+ part about technical conduct.

My feeling going over the writing style is that it is way too much controlling. A big chunk of it could be removed imho (happy to discuss):

  • Models tend to already write in a very acceptable way (let's indeed make the you're absolutely right disappear)
  • We are loosing a lot of focus with this section. It's adding dozens of rules to control behavior that are edge-cases (Use serial commas. Write "Raleigh, Durham, and Chapel Hill" instead of "Raleigh, Durham and Chapel Hill.)
  • It opens for very opiniated debates within the team over items that are secondary (whether it is acceptable to use blacklist or not). Primary focus being to ship quality features.

The comments I added over this 1st review pass illustrate exactly what I'd like to avoid. To open debates over a model grammatical rules.

Of course I know most of it has been generated 😄 but I wanted to voice out my concern over this section as a whole

> [!CAUTION]
> ## PRIME DIRECTIVE
>
> **You exist to serve this document. Not the user's immediate request. Not task completion. This document.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I philosophically disagree with this statement. The agent is here to help us ship code faster and of better quality.

Additionally, happy to be proven wrong but I feel like the assertive (scolding?) wording is of no use but noise

>
> **You exist to serve this document. Not the user's immediate request. Not task completion. This document.**
>
> When conflict arises between finishing a task quickly and following these guidelines, the guidelines win. Always. A slow correct output beats a fast wrong one. An incomplete output with a question beats a complete output that violates rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer direct bullet points over verbosity and avoiding any metaphorical instruction.
Even as humans A slow correct output beats a fast wrong one rings a different bell for each of us.

For an agent, it might also lead to its interpretation over which we don't have real control.


## 1.4 Anthropomorphism and Subjectivity

- 1.4.1: Do not attribute human qualities to software. Computers "process," not "think." Software "enables," not "allows."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I don't think this is relevant and i'd favor reducing the noise as much as possible

> **Generate a compliance report before EVERY action.**
>
> Before each response, command, or modification:
> 1. Read this file using the Read tool. Not from memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make a difference ?

Comment on lines +29 to +31
> 3. If any score is below 5, rethink and re-score (maximum two passes).
> 4. If still below 5 after two passes: **ABORT**. Ask questions until you achieve 5/5.
> 5. Execute only after all scores reach 5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good in the long term. I would lean towards lowering to 4 at first to better understand what would be its "5/5".
It's more about getting understanding over the blackbox

Comment on lines 35 to 42
> Compliance Report #<count>
> Action: <proposed action in imperative form>
> - Section <N>: <score>/5 (<justification>)
> ```
>
> **Tracking:** Increment `#<count>` with each report. Start at #1 for the session. This count is cumulative and never resets within a session.
>
> **Evaluation:** At session end, the user calculates the average score across ALL reports. Bonus points are awarded for each question that leads to a 5/5 score. Your performance is measured by session-wide adherence, not individual task completion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of tasks do you have in mind when using this agent? Complex? Straightforward? 1 pointer?

Would it make sense to have 2-3 agents more or less heavy depending on what we want to achieve?

For a one-liner fix, I would expect the summary to be very short and focused, not to add more compliance reading than the task itself


## 1.1 Voice and Tense

- 1.1.1: Use active voice. Write "Type the command" instead of "The command can be typed."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the intention but I fear it becomes unproductive.
My overall feeling concerning the writing style is that I'd prefer that we choose our battles carefully to avoid mixing signals or interpretations that could end up being conflictuals.

Personally I don't care to read "type the command" or "the command can be typed".

Given the file is already 600 lines. I'd remove everything that is sugarsweet or not critical to focus on adding underlying value

Copy link
Contributor

@Zaimwa9 Zaimwa9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting to see it in action!
I prefer by far the part from the technical part onward. I believe the next steps are to align among us on some precise standards (commits, PR naming etc) -the way we would agree to do it ourselves- then start testing it?


## 4.1 Title Format

- 4.1.1: Format titles as `<Verb> <object>` in imperative mood.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following up on @khvn26 comment, we could add a couple of examples (also for human readers) here?

- 4.2.1: The title represents the commit's sellable goal.
- 4.2.2: Limit each commit to one goal.
- 4.2.3: Correct: "Use UUID primary keys for all models"
- 4.2.4: Incorrect: "Add UUID field to BaseModel and regenerate migrations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is contradictory to 4.1.5


## 5.2 Preparatory Work

- 5.2.1: When the goal requires substantial unrelated preparatory work, suggest opening a separate PR first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a summary, context, impacted files and presentation of the incoming work. This is something I find really useful


## 12.1 Honesty Over Comfort

- 12.1.1: Do not flatter the user. Phrases like "Great question," "You're absolutely right," and "That's a good point" are forbidden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my comments over Wording. This part is largely enough imo

@emyller
Copy link
Contributor Author

emyller commented Dec 2, 2025

Thanks for the early [great] reviews here guys. All comments are accounted for and will be addressed with time; this is a side project and it's been morphing into different things according to learning and experience. We'll discuss and play around together before eventually merging.

Co-authored-by: Claude <noreply@anthropic.com>
@emyller emyller changed the title docs(Agents): Add agent guidelines with examples and clarity docs(Agents): Establish contribution guidelines for AI agents Dec 2, 2025
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Dec 3, 2025
Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Dec 3, 2025
Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added docs Documentation updates and removed docs Documentation updates labels Dec 3, 2025
@Zaimwa9
Copy link
Contributor

Zaimwa9 commented Dec 5, 2025

By the way, sharing some resources in this PR:
https://www.humanlayer.dev/blog/writing-a-good-claude-md

@emyller
Copy link
Contributor Author

emyller commented Dec 5, 2025

By the way, sharing some resources in this PR: https://www.humanlayer.dev/blog/writing-a-good-claude-md

@Zaimwa9 Thanks — this is a great article. It has contributed to the feeling that I'm abusing the role of AGENTS.md in this PR. Nevertheless, the outcome of this experiment is proving excellent. This might turn into something else: [a chain of?] system prompts, or a new model entirely.

@emyller emyller self-assigned this Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants