Skip to content

Refactor manual tests job: extract reset logic and separate infinite block tests#119

Merged
nhorton merged 7 commits intomainfrom
claude/update-manual-tests-job-cYkyd
Jan 22, 2026
Merged

Refactor manual tests job: extract reset logic and separate infinite block tests#119
nhorton merged 7 commits intomainfrom
claude/update-manual-tests-job-cYkyd

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Jan 22, 2026

Summary

This PR refactors the manual_tests job to improve maintainability and test organization by extracting common reset logic into a reusable step and moving infinite block tests into a dedicated serial step.

Key Changes

  • Version bump: Updated from 1.2.1 to 1.3.0

  • New reset step: Created steps/reset.md containing centralized reset instructions that other steps can call internally

    • Consolidates git reset, file cleanup, and queue clearing logic
    • Eliminates duplication across test steps
    • Provides clear documentation of reset procedures
  • New infinite block tests step: Created steps/infinite_block_tests.md as a dedicated step for infinite block testing

    • Moved 4 infinite block tests (2 prompt-based, 2 command-based) from run_fire_tests
    • Tests both "should fire" (no promise) and "should NOT fire" (with promise) scenarios
    • Runs serially with resets between tests due to blocking nature
  • Updated run_not_fire_tests.md:

    • Reduced from 8 to 6 tests (removed infinite block tests)
    • Added explicit sub-agent configuration requirements: model: "haiku" and max_turns: 5
    • Updated quality criteria to reference reset step instead of inline commands
  • Updated run_fire_tests.md:

    • Reduced from 8 to 6 tests (removed infinite block tests)
    • Added explicit sub-agent configuration requirements: model: "haiku" and max_turns: 5
    • Updated quality criteria to reference reset step instead of inline commands
    • Clarified that infinite block tests are handled separately
  • Updated job.yml:

    • Added sub-agent configuration guidance in description
    • Updated step descriptions to reflect test count changes
    • Added new infinite_block_tests step to workflow
    • Updated changelog with version 1.3.0 entry

Implementation Details

  • All test steps now reference the reset step for cleanup procedures, reducing documentation duplication
  • Sub-agent configuration (model: "haiku", max_turns: 5) is now explicitly documented in all test steps
  • Infinite block tests are isolated in their own step to allow for proper serial execution and controlled observation of blocking behavior
  • Quality criteria have been updated across all steps to be more consistent and reference the reset step
  • The reset step includes detailed explanation of each command and when to use it

…ck tests

- Add `model: "haiku"` and `max_turns: 5` config for all sub-agents to
  minimize cost/latency and prevent indefinite hangs
- Move infinite block tests (prompt and command) to dedicated serial step
  with both should-fire and should-not-fire scenarios
- Extract reset instructions to reusable reset.md step that other steps
  reference internally
- Reduce parallel tests from 8 to 6, serial tests from 8 to 6
- Bump version to 1.3.0
Auto-generated by `deepwork install`:
- Added skills for new infinite_block_tests and reset steps
- Updated existing step skills with new configuration
- Reset step now runs as a dependency before run_not_fire_tests to
  ensure clean environment before any tests begin
- "Should NOT fire" tests now verify the rules queue is empty after
  sub-agents complete, confirming rules truly didn't fire
- Update job description to reflect 4-step flow with reset first
- Bump version to 1.4.0
- Update Tests 3 & 4 to specify dual criteria: should fire AND should
  return in reasonable time (via max_turns limit)
- Add "Returned in Time?" column to results tracking table
- Note that Task tool has no direct timeout, so max_turns is the
  safeguard against infinite hanging
- Update quality criteria to separately verify "should NOT fire" and
  "should fire" test behaviors
- Simplify reset step to single criterion (environment clean)
- Update infinite_block_tests to include "returned in reasonable time"
  criterion for no-promise tests
- Keep verbose criteria for run_not_fire_tests and run_fire_tests
@nhorton nhorton merged commit cc4ff7e into main Jan 22, 2026
4 checks passed
@nhorton nhorton deleted the claude/update-manual-tests-job-cYkyd branch January 22, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants