Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .github/agents/opinionated-cli-simulator-tester.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: opinionated-cli-simulator-tester
description: Opinionated end-user CLI test specialist for Planeteer. Use when validating TUI behavior, keyboard flows, regressions, and UX quality by running simulator scripts and reporting concrete findings with asciinema artifacts.
tools: ['execute', 'read', 'search', 'todo']
user-invokable: true
---

# Opinionated CLI Simulator Tester

You are an opinionated, detail-oriented user who tests this CLI like a real frustrated power user. Use real commands and look for edge cases.

Be direct and critical, but always back claims with reproducible evidence.

IMPORTANT: Use simulator mode of the execute tool to run scripted CLI sessions. Use the `asciinema-terminal-recorder` skill for terminal recording evidence, and focus on UX quality, not just functional correctness.

## Test workflow

1. Build first:
```bash
npm run build
```
2. Run simulator-focused regression tests:
```bash
npm test -- src/screens/cli.integration.test.tsx
```
If that command fails because of npm arg parsing, run:
```bash
npx vitest run src/screens/cli.integration.test.tsx
```
3. Run scripted simulator sessions for the exact flow under test:
```bash
node dist/index.js simulate /tmp/sim-script.json > /tmp/sim-output.txt
```
4. Inspect frame output (`---FRAME---` separators) for UX problems:
- broken navigation flow
- confusing or missing status hints
- clipped/truncated text
- unexpected screen transitions
5. Capture evidence for findings using terminal-native artifacts:
- Save frame extracts to a markdown/text artifact and cite exact frame snippets.
- Use `skills/asciinema-terminal-recorder/scripts/record_ui_session.sh` to generate `.cast` recordings for each reproduced issue.
- Replay recordings with `asciinema play` before reporting to verify the artifact matches the claim.

## Persona requirements

- Behave like a skeptical user who expects polished UX.
- Call out awkward interactions, not just hard failures.
- Do not soften findings with vague wording.
- Never mark behavior as passing without evidence from simulator output.

## Output format

Return findings in this format:

1. **Overall verdict**: pass/fail with one-sentence rationale.
2. **Findings table** with columns:
- Severity (`critical`, `major`, `minor`, `nit`)
- Screen/flow
- Reproduction input
- Expected vs actual
- Evidence (frame artifact path and/or terminal recording path)
3. **Recommended fixes**: concrete, prioritized actions.
4. **Confidence**: high/medium/low and why.
1 change: 1 addition & 0 deletions .github/skills/not-a-skill.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ignore me
1 change: 1 addition & 0 deletions .github/skills/skill1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
name: skill1
1 change: 1 addition & 0 deletions .github/skills/skill2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
name: skill2
140 changes: 139 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,123 @@ planeteer list
| `↑` `↓` | Navigate task list |
| `⏎` | Submit input / proceed to next screen |
| `Esc` | Go back |
| `⇥` | Toggle view (Tree / Batches / Skills) |
| `Space` | Toggle skill on/off (Skills view) |
| `/` | Command mode (refine screen) |
| `s` | Save plan (refine screen) |
| `x` | Start execution (refine/execute screen) |
| `q` | Quit |

## Custom Copilot Skills

Planeteer supports custom Copilot skills for domain-specific planning. Skills help Copilot generate better work breakdowns by providing context about specific project types.

### Using Skills

Skills are automatically loaded from the `.github/skills/` directory. On first run, this directory is created with example skills. To use skills:

1. View active skills in the **Refine** screen by pressing `⇥` to cycle to the Skills view
2. Use `↑`/`↓` to navigate and `Space` to toggle skills on/off
3. Skills are applied during work breakdown generation and refinement

### Creating Skills

Create a new YAML file in `.github/skills/` with this structure:

```yaml
name: my-custom-skill
description: Brief description of what this skill helps with

instructions: |
When planning this type of project, follow these guidelines:

1. **Category 1**: Guidelines for this aspect
- Specific point 1
- Specific point 2

2. **Category 2**: More guidelines
- Another point
- Another point

General advice about task structure, dependencies, etc.

examples:
- input: "Example project description"
tasks:
- Task 1 that would be generated
- Task 2 that would be generated
- Task 3 that would be generated
```

### Skill Examples

**Example 1: Web Application Skill**

```yaml
name: web-app
description: Expert in web application development

instructions: |
Break down web projects into frontend, backend, database, and deployment:

1. **Frontend**: Component structure, routing, state management
2. **Backend**: API design, business logic, authentication
3. **Database**: Schema design, migrations, seed data
4. **Infrastructure**: CI/CD, containerization, cloud deployment

Maximize parallelism between frontend and backend work.

examples:
- input: "Build a task management web app"
tasks:
- Setup React frontend with TypeScript
- Design REST API for task CRUD
- Implement PostgreSQL schema
- Add JWT authentication
- Deploy to cloud platform
```

**Example 2: Data Pipeline Skill**

```yaml
name: data-pipeline
description: Expert in ETL and data processing workflows

instructions: |
Structure data pipelines with these phases:

1. **Extraction**: Data sources, connectors, scheduling
2. **Transformation**: Cleaning, validation, enrichment
3. **Loading**: Destination setup, batch vs streaming
4. **Monitoring**: Logging, alerts, data quality checks

Consider idempotency, error handling, and reprocessing.

examples:
- input: "Build ETL pipeline from API to data warehouse"
tasks:
- Implement API data extractor
- Create transformation functions
- Setup data warehouse schema
- Add error handling and retries
- Configure monitoring and alerts
```

### Skill Best Practices

- **One skill per domain**: Create focused skills (e.g., `mobile-app`, `ml-pipeline`) rather than generic ones
- **Clear instructions**: Be specific about task breakdown patterns and dependencies
- **Provide examples**: Include 2-3 representative examples with typical task structures
- **Enable selectively**: Toggle skills on/off based on your current project type

### Built-in Example

Two example skills are included in the repository to help you get started:
- **example-web-app-skill.yaml** - Web application development best practices
- **example-data-pipeline-skill.yaml** - ETL and data processing workflow patterns

These files are automatically available in `.github/skills/` and can be used as templates for creating your own custom skills.

## Development

### Build & Run
Expand Down Expand Up @@ -157,6 +270,30 @@ Plans are saved to `.planeteer/` in the current working directory:
- `<plan-id>.json` — Machine-readable plan (used by the app)
- `<plan-id>.md` — Human-readable Markdown export

#### Session Persistence and Recovery

Planeteer includes robust session persistence to handle interrupted executions:

**Automatic Session Tracking**
- Each task execution creates a Copilot SDK session
- Session IDs are stored in the plan JSON and saved incrementally after each task completes or fails
- If the app crashes or is interrupted (Ctrl+C), sessions remain active in the Copilot CLI

**Orphaned Session Detection**
- When loading a plan, Planeteer detects tasks that were interrupted (status: `in_progress` with session IDs)
- It queries the Copilot SDK to find any sessions still active for those tasks
- If orphaned sessions are found, you'll see a recovery prompt with options:
1. **Mark as interrupted and continue** — Keeps sessions alive for debugging
2. **Mark as interrupted and cleanup sessions** (recommended) — Cleans up orphaned sessions
3. **Cleanup sessions and go back** — Cleans up and returns to the refine screen

**Task Statuses**
- `pending` — Not yet started
- `in_progress` — Currently executing
- `done` — Completed successfully
- `failed` — Execution failed (can be retried with `r`)
- `interrupted` — Was in progress when execution was interrupted

## Project Structure

```
Expand All @@ -176,7 +313,8 @@ src/
│ ├── copilot.ts # Copilot SDK wrapper (single point of contact)
│ ├── planner.ts # Prompt engineering for planning
│ ├── executor.ts # DAG-aware parallel task dispatch
│ └── persistence.ts # JSON/Markdown save & load
│ ├── persistence.ts # JSON/Markdown save & load
│ └── session-recovery.ts # Orphaned session detection & cleanup
├── models/
│ └── plan.ts # Types: Plan, Task, ChatMessage
└── utils/
Expand Down
Loading