From a624c53d6cead485d4b93d71f92e387fc00c27e5 Mon Sep 17 00:00:00 2001 From: Sampfluger88 Date: Fri, 6 Oct 2023 22:33:14 -0500 Subject: [PATCH] Boilerplate structure --- handbook/engineering/README.md | 172 +++++++------------ handbook/engineering/engineering.rituals.yml | 98 +++++++++++ 2 files changed, 159 insertions(+), 111 deletions(-) create mode 100644 handbook/engineering/engineering.rituals.yml diff --git a/handbook/engineering/README.md b/handbook/engineering/README.md index df351374302d..1b09944cc45b 100644 --- a/handbook/engineering/README.md +++ b/handbook/engineering/README.md @@ -1,36 +1,53 @@ # Engineering +This handbook page details processes specific to working [with](#team) and [within](#responsibilities) this department -## Scrum at Fleet +## What we do +The 🚀 Engineering department at Fleet is directly responsible for writing and maintaining the [code](https://github.com/fleetdm/fleet) and [documentation](https://fleetdm.com/docs/get-started/why-fleet) used in the Fleet's core product, infrastuture, and website. -- [Sprint ceremonies](#sprint-ceremonies) -- [Scrum boards](#scrum-boards) -- [Scrum items](#scrum-items) -Fleet [product groups](https://fleetdm.com/handbook/company/development-groups#what-are-product-groups) employ scrum, an agile methodology, as a core practice in software development. This process is designed around sprints, which last three weeks to align with our release cadence. +## Team +| Role | Contributor(s) | +|:--------------------------------|:-----------------------------------------------------------------------------------------------------------| +| Chief Technical Officer | [Zach Wasserman](https://www.linkedin.com/in/zacharywasserman/) _([@zwass](https://github.com/zwass))_ +| Director of Product Devleopment | [Luke Heath](https://www.linkedin.com/in/lukeheath/) _([@lukeheath](https://github.com/lukeheath))_ +| Infrasturcture Engineer | [Robert Fairburn](https://www.linkedin.com/in/robert-fairburn/) _([@rfairburn](https://github.com/rfairburn))_ +| Engineering Manager (MDM) | [George Karr](https://www.linkedin.com/in/george-karr-4977b441/) _([@georgekarrv](https://github.com/georgekarrv))_ +| Engineering Manager (CX) | [Sharon Katz](https://www.linkedin.com/in/sharon-katz-45b1b3a/) _([@sharon-fdm](https://github.com/sharon-fdm))_ +| Product Quality Specialist | [Reed Haynes](https://www.linkedin.com/in/george-karr-4977b441/) _([@xpkoala](https://github.com/xpkoala))_, [Sabrina Coy](https://www.linkedin.com/in/bricoy/) _([@sabrinabuckets](https://github.com/sabrinabuckets))_ +| Software Engineer | [Rachel Perkins](https://www.linkedin.com/in/rachelelysia/) _([@rachelelysia](https://github.com/rachelelysia))_, [Lucas Rodriguez](https://www.linkedin.com/in/lukmr/) _([@lucasmrod](https://github.com/lucasmrod))_, [Jacob Shandling](https://www.linkedin.com/in/jacob-shandling/) _([@jacobshandling](https://github.com/jacobshandling))_, [Tim Lee](https://www.linkedin.com/in/mostlikelee/) _([@mostlikelee](https://github.com/mostlikelee))_, [Jahziel Villasana-Espinoza](https://www.linkedin.com/in/jahziel-v/) _([@jahzielv](https://github.com/jahzielv))_, [Victor Lyuboslavsky](https://www.linkedin.com/in/lyuboslavsky/) _([@getvictor](https://github.com/getvictor))_, Sarah Gillespie _([@gillespi314](https://github.com/gillespi314))_, [Martin Angers](https://www.linkedin.com/in/martin-angers-3210305/) _([@mna](https://github.com/mna))_, [Roberto Dip](https://www.linkedin.com/in/roperzh) _([@roperzh](https://github.com/roperzh))_, [Gabe Hernandez](https://www.linkedin.com/in/gabriel-hernandez-gh) _([@ghernandez345](https://github.com/ghernandez345))_, [Marcos Oviedo](https://www.linkedin.com/in/marcosoviedo/) _([@marcosd4h](https://github.com/marcosd4h))_ -### Sprint ceremonies -Each sprint is marked by five essential ceremonies: +## Contact us +- Any Fleet team member can view the dedicated sprint boards: + - 💻 [MDM (#g-cx)](https://app.zenhub.com/workspaces/-g-mdm-current-sprint-63bc507f6558550011840298/board) + - 🌟 [CX](https://app.zenhub.com/workspaces/-g-cx-current-sprint-63bd7e0bf75dba002a2343ac/board) + - 🦢 [Website](https://app.zenhub.com/workspaces/-g-website-6451748b4eb15200131d4bab/board) + - ⚙️ [Infra](https://app.zenhub.com/workspaces/-g-infra-642c83a53e96760014c978bd/board) -1. **Sprint kickoff**: On the first day of the sprint, the team, along with stakeholders, select items from the backlog to work on. The team then commits to completing these items within the sprint. -2. **Daily standup**: Every day, the team convenes for updates. During this session, each team member shares what they accomplished since the last standup, their plans until the next meeting, and any blockers they are experiencing. Standups should last no longer than fifteen minutes. If additional discussion is necessary, it takes place after the standup with only the required partipants. -3. **Weekly estimation sessions**: The team estimates backlog items once a week (three times per sprint). These sessions help to schedule work completion and align the roadmap with business needs. They also provide estimated work units for upcoming sprints. The EM is responsible for the point values assigned to each item and ensures they are as realistic as possible. -4. **Sprint demo**: On the last day of each sprint, all engineering teams and stakeholders come together to review completed work. Engineers are allotted 3-10 minutes to present their accomplishments, as well as any pending tasks. (These meetings are recorded and posted publicly to YouTube or other platforms, so participants should avoid mentioning customer names. For example, instead of "Fastly", you can say "a publicly-traded hosting company", or use the [customer's codename](https://fleetdm.com/handbook/customers#customer-codenames).) -5. **Sprint retrospective**: Also held on the last day of the sprint, this meeting encourages discussions among the team and stakeholders around three key areas: what went well, what could have been better, and what the team learned during the sprint. +for each team inside this department, including pending tasks and the status of new requests. +- Any community memeber can file a 🦟 ["Bug report"](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=bug%2C%3Areproduce&projects=&template=bug-report.md&title=) + - Any Fleet team member can view the 🦟 ["Bugs" kanban board](https://app.zenhub.com/workspaces/-bugs-647f6d382e171b003416f51a/board) including the status on all reported bugs. +- If urgent, or if you need help submiting an issue, mention a [team member](#team) in the [#help-engineering](https://fleetdm.slack.com/archives/C019WG4GH0A) Slack channel. + + -Each product group has a dedicated sprint board: -- [MDM](https://app.zenhub.com/workspaces/-g-mdm-current-sprint-63bc507f6558550011840298/board) -- [CX](https://app.zenhub.com/workspaces/-g-cx-current-sprint-63bd7e0bf75dba002a2343ac/board) -- [Website](https://app.zenhub.com/workspaces/-g-website-6451748b4eb15200131d4bab/board) -- [Infra](https://app.zenhub.com/workspaces/-g-infra-642c83a53e96760014c978bd/board) +## Responsibilities New tickets are estimated, specified, and prioritized on the roadmap: - [Roadmap](https://app.zenhub.com/workspaces/-roadmap-ships-in-6-weeks-6192dd66ea2562000faea25c/board) ### Scrum items - Our scrum boards are exclusively composed of four types of scrum items: 1. **User stories**: These are simple and concise descriptions of features or requirements from the user's perspective, marked with the `story` label. They keep our focus on delivering value to our customers. Occasionally, due to ZenHub's ticket sub-task structure, the term 'epic' may be seen. However, we treat these as regular user stories. @@ -43,8 +60,25 @@ Our scrum boards are exclusively composed of four types of scrum items: > Our sprint boards do not accommodate any other type of ticket. By strictly adhering to these four types of scrum items, we maintain an organized and focused workflow that consistently adds value for our users. -## Meetings +## Scrum at Fleet +- [Sprint ceremonies](#sprint-ceremonies) +- [Scrum boards](#scrum-boards) +- [Scrum items](#scrum-items) + +Fleet [product groups](https://fleetdm.com/handbook/company/development-groups#what-are-product-groups) employ scrum, an agile methodology, as a core practice in software development. This process is designed around sprints, which last three weeks to align with our release cadence. + +### Sprint ceremonies +Each sprint is marked by five essential ceremonies: + +1. **Sprint kickoff**: On the first day of the sprint, the team, along with stakeholders, select items from the backlog to work on. The team then commits to completing these items within the sprint. +2. **Daily standup**: Every day, the team convenes for updates. During this session, each team member shares what they accomplished since the last standup, their plans until the next meeting, and any blockers they are experiencing. Standups should last no longer than fifteen minutes. If additional discussion is necessary, it takes place after the standup with only the required partipants. +3. **Weekly estimation sessions**: The team estimates backlog items once a week (three times per sprint). These sessions help to schedule work completion and align the roadmap with business needs. They also provide estimated work units for upcoming sprints. The EM is responsible for the point values assigned to each item and ensures they are as realistic as possible. +4. **Sprint demo**: On the last day of each sprint, all engineering teams and stakeholders come together to review completed work. Engineers are allotted 3-10 minutes to present their accomplishments, as well as any pending tasks. (These meetings are recorded and posted publicly to YouTube or other platforms, so participants should avoid mentioning customer names. For example, instead of "Fastly", you can say "a publicly-traded hosting company", or use the [customer's codename](https://fleetdm.com/handbook/customers#customer-codenames).) +5. **Sprint retrospective**: Also held on the last day of the sprint, this meeting encourages discussions among the team and stakeholders around three key areas: what went well, what could have been better, and what the team learned during the sprint. + + +## Meetings - [Goals](#goals) - [Principles](#principles) - [Sprint ceremonies](#sprint-ceremonies) @@ -55,29 +89,19 @@ Our scrum boards are exclusively composed of four types of scrum items: - [Eng product bi-weekly](#eng-product-bi-weekly) - [Product development process review](#product-development-process-review) -### Goals - -- Stay in alignment across the whole organization. -- Build teams, not groups of people. -- Provide substantial time for engineers to work on "focused work." - ### Principles - - Support the [Maker Schedule](http://www.paulgraham.com/makersschedule.html) by keeping meetings to a minimum. - Each individual must have a weekly or biweekly sync 1:1 meeting with their manager. This is key to making sure each individual has a voice within the organization. - Favor async communication when possible. This is very important to make sure every stakeholder on a project can have a clear understanding of what’s happening or what was decided, without needing to attend every meeting (i.e., if a person is sick or on vacation or just life happened.) - If an async conversation is not proving to be effective, never hesitate to hop on or schedule a call. Always document the decisions made in a ticket, document, or whatever makes sense for the conversation. ### Eng Together - This meeting is to disseminate engineering-wide announcements, promote cohesion across groups within the engineering team, and connect with engineers (and the "engineering-curious") in other departments. Held monthly for one hour. #### Participants - Everyone at the company is welcome to attend. All engineers are asked to attend. The subject matter is focused on engineering. #### Agenda - - Announcements - Engineering KPIs review - “Tech talks” @@ -87,13 +111,11 @@ Everyone at the company is welcome to attend. All engineers are asked to attend. - Structured and/or unstructured social activities ### User story discovery - User story discovery meetings are scheduled as needed to align on large or complicated user stories. Before a discovery meeting is scheduled, the user story must be prioritized for product drafting and go through the design and specification process. When the user story is ready to be estimated, a user story discovery meeting may be scheduled to provide more dedicated, synchronous time for the team to discuss the user story than is available during weekly estimation sessions. All participants are expected to review the user story and associated designs and specifications before the discovery meeting. #### Participants - - Product Manager - Product Designer - Engineering Manager @@ -102,7 +124,6 @@ All participants are expected to review the user story and associated designs an - Product Quality Specialist #### Agenda - - Product Manager: Why this story has been prioritized - Product Designer: Walk through user journey wireframes - Engineering Manager: Review specifications and any defined sub-tasks @@ -110,47 +131,38 @@ All participants are expected to review the user story and associated designs an - Product Quality Specialist: Testing plan ### Group weeklies - A chance for deeper, synchronous discussion on topics relevant across product groups like “Frontend weekly”, “Backend weekly”, etc. #### Participants - Anyone who wishes to participate. #### Sample agenda (Frontend weekly) - - Discuss common patterns and conventions in the codebase - Review difficult frontend bugs - Write engineering-initiated stories ### Eng leadership weekly - Engineering leaders discuss topics of importance that week. Prepare agenda, announcements, and tech talks before the monthly [Eng Together](#eng-together) meeting. #### Participants - - Engineering Managers - Director of Product Development - CTO #### Rituals - 1. Review Engineering KPIs. 2. Review each product group's ZenHub board. 3. Proceed to agenda. #### Sample agenda - - Engineer hiring - Process discussion - New documentation needs ### Eng product bi-weekly - Engineering and product bi-weekly sync to discuss process, roadmap, and scheduling. #### Participants - - Head of Product - Product Managers (optional) - CTO @@ -158,31 +170,26 @@ Engineering and product bi-weekly sync to discuss process, roadmap, and scheduli - Engineering Managers (optional) #### Sample agenda - - Product to engineering handoff process - Q4 product roadmap - Optimizing development processes ### Product development process review - A once-per-sprint review of the bugs, drafting, and sprint boards to make sure that the current state of the boards reflects the process as defined in the handbook, or if any changes are needed to the documented process. #### Participants - - CEO - Head of Product - Product Operations - Director of Product Development #### Sample agenda - - Review bugs board - Review drafting board - Review sprint boards - How is the process working? Are any changes needed? ## Engineering-initiated stories - - [Creating an engineering-initiated story](#creating-an-engineering-initiated-story) Engineering-initiated stories are types of user stories created by engineers to make technical changes to Fleet. Technical changes should improve the user experience or contributor experience. For example, optimizing SQL that improves the response time of an API endpoint improves user experience by reducing latency. A script that generates common boilerplate, or automated tests to cover important business logic, improves the quality of life for contributors, making them happier and more productive, resulting in faster delivery of features to our customers. @@ -194,7 +201,6 @@ Engineering-initiated stories follow the [user story drafting process](https://f > We prefer the term engineering-initiated stories over technical debt because the user story format helps keep us focused on our users. ### Creating an engineering-initiated story - 1. Create a [new feature request issue](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=~engineering-initiated&projects=&template=feature-request.md&title=) in GitHub. 2. Ensure it is labeled with `~engineering-initiated` and the relevant product group. Remove any `~customer-request` label. 3. Assign it to yourself. You will own this user story until it is either prioritized or closed. @@ -204,29 +210,24 @@ Engineering-initiated stories follow the [user story drafting process](https://f > We aspire to dedicate 20% of each sprint to technical changes, but may allocate less based on customer needs and business priorities. ## Documentation for contributors - Fleet's documentation for contributors can be found in the [Fleet GitHub repo](https://github.com/fleetdm/fleet/tree/main/docs/Contributing). ## Release process - This section outlines the release process at Fleet. The current release cadence is once every three weeks and is concentrated around Wednesdays. ### Release freeze period - To ensure release quality, Fleet has a freeze period for testing beginning the Tuesday before the release at 9:00 AM Pacific. Effective at the start of the freeze period, new feature work will not be merged into `main`. Bugs are exempt from the release freeze period. ### Freeze day - To begin the freeze, [open the repo on Merge Freeze](https://www.mergefreeze.com/installations/3704/branches/6847) and click the "Freeze now" button. This will freeze the `main` branch and require any PRs to be manually unfrozen before merging. PRs can be manually unfrozen in Merge Freeze using the PR number. > Any Fleetie can [unfreeze PRs on Merge Freeze](https://www.mergefreeze.com/installations/3704/branches) if the PR contains documentation changes or bug fixes only. If the PR contains other changes, please confirm with your manager before unfreezing. #### Check dependencies - Before kicking off release QA, confirm that we are using the latest versions of dependencies we want to keep up-to-date with each release. Currently, those dependencies are: 1. **Go**: Latest minor release @@ -250,11 +251,9 @@ Before kicking off release QA, confirm that we are using the latest versions of Our goal is to keep these dependencies up-to-date with each release of Fleet. If a release is going out with an old dependency version, it should be treated as a [critical bug](https://fleetdm.com/handbook/engineering#critical-bugs) to make sure it is updated before the release is published. #### Create release QA issue - Next, create a new GitHub issue using the [Release QA template](https://github.com/fleetdm/fleet/issues/new?assignees=&labels=&projects=&template=smoke-tests.md&title=). Add the release version to the title, and assign the quality assurance members of the [MDM](https://fleetdm.com/handbook/company/development-groups#mdm-group) and [CX](https://fleetdm.com/handbook/company/development-groups#customer-experience-group) product groups. ### Merging during the freeze period - We merge bug fixes and documentation changes during the freeze period, but we do not merge other code changes. This minimizes code churn and helps ensure a stable release. To merge a bug fix, you must first unfreeze the PR in [Merge Freeze](https://app.mergefreeze.com/installations/3704/branches), and click the "Unfreeze 1 pull request" text link. @@ -265,15 +264,12 @@ It is sometimes necessary to delay the release to allow time to complete partial 3. The Engineering Manager, QA lead, and [release ritual DRI](#rituals) must all approve the feature work PR before it is unfrozen and merged. ### Release readiness - After each product group finishes their QA process during the freeze period, the EM @ mentions the release ritual DRI in the #help-qa Slack channel. When all EMs have certified that they are ready for release, the release ritual DRI begins the [release process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md). ### Release day - Documentation on completing the release process can be found [here](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md). ## Deploying to dogfood - After each Fleet release, the new release is deployed to Fleet's dogfood (internal) instance. How to deploy a new release to dogfood: @@ -289,7 +285,6 @@ How to deploy a new release to dogfood: > Note that "fleetdm/fleet:main" is not a image name, instead use the commit hash in place of "main". ## Milestone release ritual - Immediately after publishing a new release, we close out the associated GitHub issues and milestones. ### Update milestone in GitHub @@ -297,7 +292,6 @@ Immediately after publishing a new release, we close out the associated GitHub i 1. **Rename current milestone**: In GitHub, [change the current milestone name](https://github.com/fleetdm/fleet/milestones) from `4.x.x-tentative` to `4.x.x`. `4.37.0-tentative` becomes `4.37.0`. ### ZenHub housekeeping - 2. **Update product group boards**: In ZenHub, go to each product group board tracking the current release. Usually, these are [#g-cx](https://app.zenhub.com/workspaces/-g-cx-current-sprint-63bd7e0bf75dba002a2343ac/board) and [#g-mdm](https://app.zenhub.com/workspaces/-g-mdm-current-sprint-63bc507f6558550011840298/board). 3. **Remove milestone from unfinished items**: If you see any items in columns other than "Ready for release" tagged with the current milestone, remove that milestone tag. These items didn't make it into the release. @@ -319,7 +313,6 @@ Immediately after publishing a new release, we close out the associated GitHub i 12. Announce that `main` is unfrozen and the milestone has been closed in #help-engineering. ## Oncall rotation - - [The rotation](#the-rotation) - [Responsibilities](#responsibilities) - [Clearing the plate](#clearing-the-plate) @@ -328,7 +321,6 @@ Immediately after publishing a new release, we close out the associated GitHub i - [Handoff](#handoff) ### The rotation - See [the internal Google Doc](https://docs.google.com/document/d/1FNQdu23wc1S9Yo6x5k04uxT2RwT77CIMzLLeEI2U7JA/edit#) for the engineers in the rotation. Fleet team members can also subscribe to the [shared calendar](https://calendar.google.com/calendar/u/0?cid=Y181MzVkYThiNzMxMGQwN2QzOWEwMzU0MWRkYzc5ZmVhYjk4MmU0NzQ1ZTFjNzkzNmIwMTAxOTllOWRmOTUxZWJhQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20) for calendar events. @@ -338,14 +330,12 @@ New engineers are added to the oncall rotation by their manager after they have > The oncall rotation may be adjusted with approval from the EMs of any product groups affected. Any changes should be made before the start of the sprint so that capacity can be planned accordingly. ### Responsibilities - - [Second-line response](#second-line-response) - [PR reviews](#pr-reviews) - [Customer success meetings](#customer-success-meetings) - [Improve documentation](#improve-documentation) #### Second-line response - The oncall engineer is a second-line responder to questions raised by customers and community members. The community contact (Kathy) is responsible for the first response to GitHub issues, pull requests, and Slack messages in the [#fleet channel](https://osquery.slack.com/archives/C01DXJL16D8) of osquery Slack, and other public Slacks. Kathy and Zay are responsible for the first response to messages in private customer Slack channels. @@ -355,7 +345,6 @@ We respond within 1-hour (during business hours) for interactions and ask the on > Response SLAs help us measure and guarantee the responsiveness that a customer [can expect](https://fleetdm.com/handbook/company#values) from Fleet. But SLAs aside, when a Fleet customer has an emergency or other time-sensitive situation ongoing, it is Fleet's priority to help them find them a solution quickly. #### PR reviews - PRs from Fleeties are reviewed by auto-assignment of codeowners, or by selecting the group or reviewer manually. PRs should remain in draft until they are ready to be reviewed for final approval, this means the feature is complete with tests already added. This helps keep our active list of PRs relevant and focused. It is ok and encouraged to request feedback while a PR is in draft to engage the team. @@ -363,17 +352,14 @@ PRs should remain in draft until they are ready to be reviewed for final approva All PRs from the community are routed through the oncall engineer. For documentation changes, the community contact ([Kathy](https://github.com/ksatter)) is assigned by the oncall engineer. For code changes, if the oncall engineer has the knowledge and confidence to review, they should do so. Otherwise, they should request a review from an engineer with the appropriate domain knowledge. It is the oncall engineer's responsibility to monitor community PRs and make sure that they are moved forward (either by review with feedback or merge). #### Customer success meetings - The oncall engineer is encouraged to attend some of the customer success meetings during the week. Post a message to the #g-cx Slack channel requesting invitations to upcoming meetings. This has a dual purpose of providing more context for how our customers use Fleet. The engineer should actively participate and provide input where appropriate (if not sure, please ask your manager or organizer of the call). #### Improve documentation - The oncall engineer is asked to read, understand, test, correct, and improve at least one doc page per week. Our goal is to 1, ensure accuracy and verify that our deployment guides and tutorials are up to date and work as expected. And 2, improve the readability, consistency, and simplicity of our documentation – with empathy towards first-time users. See [Writing documentation](https://fleetdm.com/handbook/marketing#writing-documentation) for writing guidelines, and don't hesitate to reach out to [#g-digital-experience](https://fleetdm.slack.com/archives/C01GQUZ91TN) on Slack for writing support. A backlog of documentation improvement needs is kept [here](https://github.com/orgs/fleetdm/projects/40/views/10). ### Clearing the plate - Engineering managers are asked to be aware of the [oncall rotation](https://docs.google.com/document/d/1FNQdu23wc1S9Yo6x5k04uxT2RwT77CIMzLLeEI2U7JA/edit#) and schedule a light workload for engineers while they are oncall. While it varies week to week considerably, the oncall responsibilities can sometimes take up a substantial portion of the engineer's time. The remaining time after fulfilling the responsibilities of oncall is free for the engineer to choose their own path. Please choose something relevant to your work or Fleet's goals to focus on. If unsure, feel free to speak with your manager. @@ -389,11 +375,9 @@ Some ideas: At the end of your oncall shift, you will be asked to share about how you spent your time. ### How to reach the oncall engineer - Oncall engineers do not need to actively monitor Slack channels, except when called in by the Community or Customer teams. Members of those teams are instructed to `@oncall` in `#help-engineering` to get the attention of the oncall engineer to continue discussing any issues that come up. In some cases, the Community or Customer representative will continue to communicate with the requestor. In others, the oncall engineer will communicate directly (team members should use their judgment and discuss on a case-by-case basis how to best communicate with community members and customers). ### Escalations - When the oncall engineer is unsure of the answer, they should follow this process for escalation. To achieve quick "first-response" times, you are encouraged to say something like "I don't know the answer and I'm taking it back to the team," or "I think X, but I'm confirming that with the team (or by looking in the code)." @@ -405,7 +389,6 @@ How to escalate: 2. Create a new thread in the [#help-engineering channel](https://fleetdm.slack.com/archives/C019WG4GH0A), tagging `@zwass` and provide the information turned up in your research. Please include possibly relevant links (even if you didn't find what you were looking for there). Zach will work with you to craft an appropriate answer or find another team member who can help. ### Handoff - The oncall engineer changes each week on Wednesday. A Slack reminder should notify the oncall of the handoff. Please do the following: @@ -430,7 +413,6 @@ In the Slack reminder thread, the oncall engineer includes their retrospective. 3. How did you spend the rest of your oncall week? This is a chance to demo or share what you learned. ## Incident postmortems - At Fleet, we take customer incidents very seriously. After working with customers to resolve issues, we will conduct an internal postmortem to determine any documentation or coding changes to prevent similar incidents from happening in the future. Why? We strive to make Fleet the best osquery management platform globally, and we sincerely believe that starts with sharing lessons learned with the community to become stronger together. At Fleet, we do postmortem meetings for every production incident, whether it's a customer's environment or on fleetdm.com. @@ -440,11 +422,9 @@ At Fleet, we do postmortem meetings for every production incident, whether it's - [Postmortem action items](#postmortem-action-items) ### Postmortem document - Before running the postmortem meeting, copy this [Postmortem Template](https://docs.google.com/document/d/1Ajp2LfIclWfr4Bm77lnUggkYNQyfjePiWSnBv1b1nwM/edit?usp=sharing) document and populate it with some initial data to enable a productive conversation. ### Postmortem meeting - Invite all stakeholders, typically the team involved and QA representatives. Follow the document topic by topic. Keep the goal in mind which is to take action items for addressing the root cause and making sure a similar incident will not happen again. @@ -454,11 +434,9 @@ Distinguish between the root cause of the bug, which by that time was solved and [Example Finished Document](https://docs.google.com/document/d/1YnETKhH9R7STAY-PaFnPy2qxhNht2EAFfkv-kyEwebQ/edit?usp=share_link) ### Postmortem action items - Each action item will have an owner that will be responsible for creating a Github issue promptly after the meeting. This Github issue should be prioritized with the relevant PM/EM. ## Outages - At Fleet, we consider an outage to be a situation where new features or previously stable features are broken or unusable. - Occurences of outages are tracked in the [Outages](https://docs.google.com/spreadsheets/d/1a8rUk0pGlCPpPHAV60kCEUBLvavHHXbk_L3BI0ybME4/edit#gid=0) spreadsheet. @@ -466,15 +444,12 @@ At Fleet, we consider an outage to be a situation where new features or previous - Fleet stresses the critical importance of avoiding outages because they make customers' lives worse instead of better. ## Scaling Fleet - Fleet, as a Go server, scales horizontally very well. It’s not very CPU or memory intensive. However, there are some specific gotchas to be aware of when implementing new features. Visit our [scaling Fleet page](https://fleetdm.com/handbook/engineering/scaling-fleet) for tips on scaling Fleet as efficiently and effectively as possible. ## Load testing - The [load testing page](https://fleetdm.com/handbook/engineering/load-testing) outlines the process we use to load test Fleet, and contains the results of our semi-annual load test. ## Version support - To provide the most accurate and efficient support, Fleet will only target fixes based on the latest released version. In the current version fixes, Fleet will not backport to older releases. Community version supported for bug fixes: **Latest version only** @@ -486,7 +461,6 @@ Premium version supported for bug fixes: **Latest version only** Premium support for support/troubleshooting: **All versions** ## Reviewing PRs from the community - If you're assigned a community pull request for review, it is important to keep things moving for the contributor. The goal is to not go more than one business day without following up with the contributor. A PR should be merged if: @@ -514,7 +488,6 @@ For PRs that will not be merged: - Close the PR. ### Merging community PRs - When merging a pull request from a community contributor: - Ensure that the checklist for the submitter is complete. @@ -524,7 +497,6 @@ When merging a pull request from a community contributor: - Share the merged PR with the team in the #help-promote channel of Fleet Slack to be publicized on social media. Those who contribute to Fleet and are recognized for their contributions often become great champions for the project. ## Changes to tables' schema - Whenever a PR is proposed for making changes to our [tables' schema](https://fleetdm.com/tables/screenlock)(e.g. to schema/tables/screenlock.yml), it also has to be reflected in our osquery_fleet_schema.json file. The website team will [periodically](https://fleetdm.com/handbook/marketing/website-handbook#rituals) update the json file with the latest changes. If the changes should be deployed sooner, you can generate the new json file yourself by running these commands: @@ -538,13 +510,11 @@ cd website > If a table is added to our ChromeOS extension but it does not exist in osquery or if it is a table added by fleetd, add a note that mentions it. As in this [example](https://github.com/fleetdm/fleet/blob/e95e075e77b683167e86d50960e3dc17045e3c44/schema/tables/mdm.yml#L2). ## Quality - - [Human-oriented QA](#human-oriented-qa) - [Finding bugs](#finding-bugs) - [Outages](#outages) ### Human-oriented QA - Fleet uses a human-oriented quality assurance (QA) process to make sure the product meets the standards of users and organizations. Automated tests are important, but they can't catch everything. Many issues are hard to notice until a human looks empathetically at the user experience, whether in the user interface, the REST API, or the command line. @@ -562,7 +532,6 @@ The goal of quality assurance is to identify corrections and improvements before - Perceived data freshness ### Finding bugs - To try Fleet locally for QA purposes, run `fleetctl preview`, which defaults to running the latest stable release. To target a different version of Fleet, use the `--tag` flag to target any tag in [Docker Hub](https://hub.docker.com/r/fleetdm/fleet/tags?page=1&ordering=last_updated), including any git commit hash or branch name. For example, to QA the latest code on the `main` branch of fleetdm/fleet, you can run: `fleetctl preview --tag=main`. @@ -574,11 +543,9 @@ For each bug found, please use the [bug report template](https://github.com/flee For unreleased bugs in an active sprint, a new bug is created with the `~unreleased bug` label. The `:release` label and associated product group label is added, and the engineer responsible for the feature is assigned. If QA is unsure who the bug should be assigned to, it is assigned to the EM. Fixing the bug becomes part of the story. ### Debugging - You can read our guide to diagnosing issues in Fleet on the [debugging page](https://fleetdm.com/handbook/engineering/debugging). ## Bug process - - [Bug states](#bug-states) - [Finding bugs](#finding-bugs) - [Outages](#outages) @@ -638,12 +605,10 @@ Fleet [always prioritizes bugs](https://fleetdm.com/handbook/product#prioritizin Bugs will be verified as fixed by QA when they are placed in the "Awaiting QA" column of the relevant product group's sprint board. If the bug is verified as fixed, it is moved to the "Ready for release" column of the sprint board. Otherwise, the remaining issues are noted in a comment, and it is moved back to the "In progress" column of the sprint board. ### All bugs - - [See on GitHub](https://github.com/fleetdm/fleet/issues?q=is%3Aissue+is%3Aopen+label%3Abug). - [See on ZenHub](https://app.zenhub.com/workspaces/-bugs-647f6d382e171b003416f51a/board). #### Bugs opened this week - This filter returns all "bug" issues opened after the specified date. Simply replace the date with a YYYY-MM-DD equal to one week ago. [See on GitHub](https://github.com/fleetdm/fleet/issues?q=is%3Aissue+archived%3Afalse+label%3Abug+created%3A%3E%3DREPLACE_ME_YYYY-MM-DD). #### Bugs closed this week @@ -651,7 +616,6 @@ This filter returns all "bug" issues opened after the specified date. Simply rep This filter returns all "bug" issues closed after the specified date. Simply replace the date with a YYYY-MM-DD equal to one week ago. [See on Github](https://github.com/fleetdm/fleet/issues?q=is%3Aissue+archived%3Afalse+is%3Aclosed+label%3Abug+closed%3A%3E%3DREPLACE_ME_YYYY-MM-DD). ## Release testing - - [Release blockers](#release-blockers) - [Critical bugs](#critical-bugs) @@ -662,11 +626,9 @@ When a critical bug is found, the Fleetie who labels the bug as critical is resp All unreleased bugs are addressed before publishing a release. Released bugs that are not critical may be addressed during the next release per the standard [bug process](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#bug-process). ### Release blockers - Product may add the `~release blocker` label to user stories to indicate that the story must be completed to publish the next version of Fleet. Bugs are never labeled as release blockers. ### Critical bugs - A critical bug is a bug with the `~critical bug` label. A critical bug is defined as behavior that: * Blocks the normal use of a workflow * Prevents upgrades to Fleet @@ -674,7 +636,6 @@ A critical bug is a bug with the `~critical bug` label. A critical bug is define * Introduces a security vulnerability #### Critical bug notification process - We need to inform customers and the community about critical bugs immediately so they don’t trigger it themselves. When a bug meeting the definition of critical is found, the bug finder is responsible for raising an alarm. Raising an alarm means pinging @here in the #help-product channel with the filed bug. @@ -690,7 +651,6 @@ If a quick fix workaround exists, that should be communicated as well for those When a critical bug is identified, we will then follow the patch release process in [our documentation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#patch-releases). ## Measurement - We track the success of this process by observing the throughput of issues through the system and identifying where buildups (and therefore bottlenecks) are occurring. The metrics are: * Number of bugs opened this week @@ -705,7 +665,6 @@ In the above process, any reference to "product" refers to: Mo Zhu, Head of Prod In the above process, any reference to "QA" refers to: Reed Haynes, Product Quality Specialist ## Infrastructure - - [Infrastructure links](#infrastructure-links) - [Best practices](#best-practices) - [24/7 on-call](#24-7-on-call) @@ -713,7 +672,6 @@ In the above process, any reference to "QA" refers to: Reed Haynes, Product Qual The [infrastructure product group](https://fleetdm.com/handbook/company/development-groups#infrastructure-group) is responsible for deploying, supporting, and maintaining all Fleet-managed cloud deployments. ### Infrastructure links - The following are quick links to infrastructure-related README files in both public and private repos that can be used as a quick reference for infrastructure-related code: - [Sandbox](https://github.com/fleetdm/fleet/blob/main/infrastructure/sandbox/readme.md) @@ -724,7 +682,6 @@ The following are quick links to infrastructure-related README files in both pub - [VPN](https://github.com/fleetdm/confidential/blob/main/vpn/README.md) ### Best practices - The infrastructure team follows industry best practices when designing and deploying infrastructure. For containerized infrastructure, Google has created a [reference document](https://cloud.google.com/architecture/best-practices-for-operating-containers) as an ideal reference for these practices. Many of these practices must be implemented in Fleet directly, and engineering will work to ensure that feature implementation follows these practices. The infrastructure team will make itself available to provide guidance as needed. If a feature is not compatible with these practices, an issue will be created with a request to correct the implementation. @@ -768,11 +725,9 @@ The information needed to evaluate and potentially fix any issues is documented When an infrastructure on-call engineer is out of the office, Zach Wasserman will serve as a backup to on-call in #help-p1. All absences must be communicated in advance to Luke Heath and Zach Wasserman. ## Accounts - Engineering is responsible for managing third-party accounts required to support engineering infrastructure. ### Apple developer account - We use the official Fleet Apple developer account to notarize installers we generate for Apple devices. Whenever Apple releases new terms of service, we are unable to notarize new packages until the new terms are accepted. When this occurs, we will begin receiving the following error message when attempting to notarize packages: "You must first sign the relevant contracts online." To resolve this error, follow the steps below. @@ -786,9 +741,8 @@ When this occurs, we will begin receiving the following error message when attem 4. Complete the 2FA process to log in. 5. Accept the new terms of service. - + -## Slack channels + -The following [Slack channels are maintained](https://fleetdm.com/handbook/company#group-slack-channels) by this group: -| Slack channel | [DRI](https://fleetdm.com/handbook/company#why-group-slack-channels) | -| :------------------- | :------------------------------------------------------------------- | -| `#help-engineering` | Zach Wasserman | -| `#g-mdm` | George Karr | -| `#g-customer-experience` | Sharon Katz | -| `#g-infra` | Luke Heath | -| `#help-qa` | Reed Haynes | -| `#_pov-environments` | Ben Edwards | +#### Stubs + +##### Scrum boards +Please see 📖[handbook/company/engineering#contact-us](https://fleetdm.com/handbook/company/engineering#contact-us) diff --git a/handbook/engineering/engineering.rituals.yml b/handbook/engineering/engineering.rituals.yml new file mode 100644 index 000000000000..8cf2a4cdef6f --- /dev/null +++ b/handbook/engineering/engineering.rituals.yml @@ -0,0 +1,98 @@ +- + task: "Pull request review" # Title that will actually show in rituals table + startedOn: "2023-08-09" # Needs to align with frequency e.g. if frequency is every thrid Thursday startedOn === any third thursday + frequency: "Daily" # must be supported by + description: "Engineers go through pull requests for which their review has been requested." # example of a longer thing: description: "[Prioritizing next sprint](https://fleetdm.com/handbook/company/communication)" + moreInfoUrl: "https://fleetdm.com/handbook/company/why-this-way#why-make-work-visible" #URL used to highlight "description:" test in table + dri: "lukeheath" # DRI for ritual (assignee if autoIssue) (TODO display GitHub proflie pic instead of name or title) + #autoIssue: # Enables automation of GitHub issues + #labels: [ "#g-cx" ] # label to be applied to issue +- + task: "Engineering group discussions" + startedOn: "2023-08-09" + frequency: "Daily" + description: "Engineers go through pull requests for which their review has been requested." + moreInfoUrl: + dri: "lukeheath" +- + task: "Oncall handoff" + startedOn: "2023-08-09" + frequency: "Weekly" + description: "Hand off the oncall engineering responsibilities to the next oncall engineer." + moreInfoUrl: + dri: "lukeheath" +- + task: "Vulnerability alerts (fleetdm.com)" + startedOn: "2023-08-09" + frequency: "Weekly" + description: "Review and remediate or dismiss [vulnerability alerts](https://github.com/fleetdm/fleet/security) for the fleetdm.com codebase on GitHub." + moreInfoUrl: + dri: "eashaw" +- + task: "Vulnerability alerts (frontend)" + startedOn: "2023-08-09" + frequency: "Weekly" + description: "Review and remediate or dismiss [vulnerability alerts](https://github.com/fleetdm/fleet/security) for the Fleet frontend codebase (and related JS) on GitHub." + moreInfoUrl: + dri: "zwass" +- + task: "Vulnerability alerts (backend)" + startedOn: "2023-08-09" + frequency: "Weekly" + description: "Review and remediate or dismiss [vulnerability alerts](https://github.com/fleetdm/fleet/security) for the Fleet backend codebase (and all Go code) on GitHub." + moreInfoUrl: + dri: "zwass" +- + task: "Freeze ritual" + startedOn: "2023-08-09" + frequency: "Triweekly" + description: "Go through the process of freezing the `main` branch to prepare for the next release." + moreInfoUrl: "https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#patch-releases" + dri: "lukeheath" +- + task: "Release ritual" + startedOn: "2023-08-09" + frequency: "Triweekly" + description: "Go through the process of releasing the next iteration of Fleet." + moreInfoUrl: "https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md" + dri: "lukeheath" +- + task: "Create patch release branch" + startedOn: "2023-08-09" + frequency: "Every patch release" + description: "Go through the process of creating a patch release branch, cherry picking commits, and pushing the branch to github.com/fleetdm/fleet." + moreInfoUrl: "https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Releasing-Fleet.md#patch-releases" + dri: "lukeheath" +- + task: "Bug review" + startedOn: "2023-08-09" + frequency: "Weekly" + description: "Review bugs that are in QA's inbox." + moreInfoUrl: + dri: "xpkoala" +- + task: "QA report" + startedOn: "2023-08-09" + frequency: "Triweekly" + description: "Every release cycle, on the Monday of release week, the DRI for the release ritual is updated on status of testing." + moreInfoUrl: + dri: "xpkoala" +- + task: "Release QA" + startedOn: "2023-08-09" + frequency: "Triweekly" + description: "Every release cycle, by end of day Friday of release week, all issues move to Ready for release on the #g-mdm and #g-cx sprint boards." + moreInfoUrl: + dri: "xpkoala" +- + + + + + + + + + + +