colindembovsky · Copilot · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026
diff --git a/.github/agents/opinionated-cli-simulator-tester.agent.md b/.github/agents/opinionated-cli-simulator-tester.agent.md
@@ -0,0 +1,63 @@
+---
+name: opinionated-cli-simulator-tester
+description: Opinionated end-user CLI test specialist for Planeteer. Use when validating TUI behavior, keyboard flows, regressions, and UX quality by running simulator scripts and reporting concrete findings with asciinema artifacts.
+tools: ['execute', 'read', 'search', 'todo']
+user-invokable: true
+---
+
+# Opinionated CLI Simulator Tester
+
+You are an opinionated, detail-oriented user who tests this CLI like a real frustrated power user. Use real commands and look for edge cases.
+
+Be direct and critical, but always back claims with reproducible evidence.
+
+IMPORTANT: Use simulator mode of the execute tool to run scripted CLI sessions. Use the `asciinema-terminal-recorder` skill for terminal recording evidence, and focus on UX quality, not just functional correctness.
+
+## Test workflow
+
+1. Build first:
+   ```bash
+   npm run build
+   ```
+2. Run simulator-focused regression tests:
+   ```bash
+   npm test -- src/screens/cli.integration.test.tsx
+   ```
+   If that command fails because of npm arg parsing, run:
+   ```bash
+   npx vitest run src/screens/cli.integration.test.tsx
+   ```
+3. Run scripted simulator sessions for the exact flow under test:
+   ```bash
+   node dist/index.js simulate /tmp/sim-script.json > /tmp/sim-output.txt
+   ```
+4. Inspect frame output (`---FRAME---` separators) for UX problems:
+   - broken navigation flow
+   - confusing or missing status hints
+   - clipped/truncated text
+   - unexpected screen transitions
+5. Capture evidence for findings using terminal-native artifacts:
+    - Save frame extracts to a markdown/text artifact and cite exact frame snippets.
+    - Use `skills/asciinema-terminal-recorder/scripts/record_ui_session.sh` to generate `.cast` recordings for each reproduced issue.
+    - Replay recordings with `asciinema play` before reporting to verify the artifact matches the claim.
+
+## Persona requirements
+
+- Behave like a skeptical user who expects polished UX.
+- Call out awkward interactions, not just hard failures.
+- Do not soften findings with vague wording.
+- Never mark behavior as passing without evidence from simulator output.
+
+## Output format
+
+Return findings in this format:
+
+1. **Overall verdict**: pass/fail with one-sentence rationale.
+2. **Findings table** with columns:
+   - Severity (`critical`, `major`, `minor`, `nit`)
+   - Screen/flow
+   - Reproduction input
+   - Expected vs actual
+   - Evidence (frame artifact path and/or terminal recording path)
+3. **Recommended fixes**: concrete, prioritized actions.
+4. **Confidence**: high/medium/low and why.
diff --git a/.github/skills/not-a-skill.txt b/.github/skills/not-a-skill.txt
@@ -0,0 +1 @@
+ignore me
diff --git a/.github/skills/skill1.yaml b/.github/skills/skill1.yaml
@@ -0,0 +1 @@
+name: skill1
diff --git a/.github/skills/skill2.yml b/.github/skills/skill2.yml
@@ -0,0 +1 @@
+name: skill2
diff --git a/README.md b/README.md
@@ -53,10 +53,123 @@ planeteer list
 | `↑` `↓` | Navigate task list |
 | `⏎` | Submit input / proceed to next screen |
 | `Esc` | Go back |
+| `⇥` | Toggle view (Tree / Batches / Skills) |
+| `Space` | Toggle skill on/off (Skills view) |
+| `/` | Command mode (refine screen) |
 | `s` | Save plan (refine screen) |
 | `x` | Start execution (refine/execute screen) |
 | `q` | Quit |
 
+## Custom Copilot Skills
+
+Planeteer supports custom Copilot skills for domain-specific planning. Skills help Copilot generate better work breakdowns by providing context about specific project types.
+
+### Using Skills
+
+Skills are automatically loaded from the `.github/skills/` directory. On first run, this directory is created with example skills. To use skills:
+
+1. View active skills in the **Refine** screen by pressing `⇥` to cycle to the Skills view
+2. Use `↑`/`↓` to navigate and `Space` to toggle skills on/off
+3. Skills are applied during work breakdown generation and refinement
+
+### Creating Skills
+
+Create a new YAML file in `.github/skills/` with this structure:
+
+```yaml
+name: my-custom-skill
+description: Brief description of what this skill helps with
+
+instructions: |
+  When planning this type of project, follow these guidelines:
+
+  1. **Category 1**: Guidelines for this aspect
+     - Specific point 1
+     - Specific point 2
+
+  2. **Category 2**: More guidelines
+     - Another point
+     - Another point
+
+  General advice about task structure, dependencies, etc.
+
+examples:
+  - input: "Example project description"
+    tasks:
+      - Task 1 that would be generated
+      - Task 2 that would be generated
+      - Task 3 that would be generated
+```
+
+### Skill Examples
+
+**Example 1: Web Application Skill**
+
+```yaml
+name: web-app
+description: Expert in web application development
+
+instructions: |
+  Break down web projects into frontend, backend, database, and deployment:
+
+  1. **Frontend**: Component structure, routing, state management
+  2. **Backend**: API design, business logic, authentication
+  3. **Database**: Schema design, migrations, seed data
+  4. **Infrastructure**: CI/CD, containerization, cloud deployment
+
+  Maximize parallelism between frontend and backend work.
+
+examples:
+  - input: "Build a task management web app"
+    tasks:
+      - Setup React frontend with TypeScript
+      - Design REST API for task CRUD
+      - Implement PostgreSQL schema
+      - Add JWT authentication
+      - Deploy to cloud platform
+```
+
+**Example 2: Data Pipeline Skill**
+
+```yaml
+name: data-pipeline
+description: Expert in ETL and data processing workflows
+
+instructions: |
+  Structure data pipelines with these phases:
+
+  1. **Extraction**: Data sources, connectors, scheduling
+  2. **Transformation**: Cleaning, validation, enrichment
+  3. **Loading**: Destination setup, batch vs streaming
+  4. **Monitoring**: Logging, alerts, data quality checks
+
+  Consider idempotency, error handling, and reprocessing.
+
+examples:
+  - input: "Build ETL pipeline from API to data warehouse"
+    tasks:
+      - Implement API data extractor
+      - Create transformation functions
+      - Setup data warehouse schema
+      - Add error handling and retries
+      - Configure monitoring and alerts
+```
+
+### Skill Best Practices
+
+- **One skill per domain**: Create focused skills (e.g., `mobile-app`, `ml-pipeline`) rather than generic ones
+- **Clear instructions**: Be specific about task breakdown patterns and dependencies
+- **Provide examples**: Include 2-3 representative examples with typical task structures
+- **Enable selectively**: Toggle skills on/off based on your current project type
+
+### Built-in Example
+
+Two example skills are included in the repository to help you get started:
+- **example-web-app-skill.yaml** - Web application development best practices
+- **example-data-pipeline-skill.yaml** - ETL and data processing workflow patterns
+
+These files are automatically available in `.github/skills/` and can be used as templates for creating your own custom skills.
+
 ## Development
 
 ### Build & Run
@@ -157,6 +270,30 @@ Plans are saved to `.planeteer/` in the current working directory:
 - `<plan-id>.json` — Machine-readable plan (used by the app)
 - `<plan-id>.md` — Human-readable Markdown export
 
+#### Session Persistence and Recovery
+
+Planeteer includes robust session persistence to handle interrupted executions:
+
+**Automatic Session Tracking**
+- Each task execution creates a Copilot SDK session
+- Session IDs are stored in the plan JSON and saved incrementally after each task completes or fails
+- If the app crashes or is interrupted (Ctrl+C), sessions remain active in the Copilot CLI
+
+**Orphaned Session Detection**
+- When loading a plan, Planeteer detects tasks that were interrupted (status: `in_progress` with session IDs)
+- It queries the Copilot SDK to find any sessions still active for those tasks
+- If orphaned sessions are found, you'll see a recovery prompt with options:
+  1. **Mark as interrupted and continue** — Keeps sessions alive for debugging
+  2. **Mark as interrupted and cleanup sessions** (recommended) — Cleans up orphaned sessions
+  3. **Cleanup sessions and go back** — Cleans up and returns to the refine screen
+
+**Task Statuses**
+- `pending` — Not yet started
+- `in_progress` — Currently executing
+- `done` — Completed successfully
+- `failed` — Execution failed (can be retried with `r`)
+- `interrupted` — Was in progress when execution was interrupted
+
 ## Project Structure
 
 ```
@@ -176,7 +313,8 @@ src/
 │   ├── copilot.ts         # Copilot SDK wrapper (single point of contact)
 │   ├── planner.ts         # Prompt engineering for planning
 │   ├── executor.ts        # DAG-aware parallel task dispatch
-│   └── persistence.ts     # JSON/Markdown save & load
+│   ├── persistence.ts     # JSON/Markdown save & load
+│   └── session-recovery.ts # Orphaned session detection & cleanup
 ├── models/
 │   └── plan.ts            # Types: Plan, Task, ChatMessage
 └── utils/