Luke/codex readiness skills #48

lukeqin-oai · 2026-01-21T22:00:12Z

Overview

This PR adds two Codex readiness skills that evaluate repository guidance quality
and end‑to‑end agentic execution. The unit test focuses on deterministic
checks + in‑session LLM evaluation of AGENTS.md/PLANS.md quality, while the
integration test runs a full agentic loop and scores real code changes and
build/test outcomes.

1) codex-readiness-unit-test (LLM Codex Readiness Unit Test)

Goal

Validate that AGENTS.md and PLANS.md provide sufficient, usable guidance using
deterministic checks plus in‑session LLM evaluation, and generate a scored
JSON + HTML report.

How it works

This skill builds a report from two pipelines: deterministic filesystem checks
and in‑session LLM evaluation of AGENTS.md/PLANS.md guidance. It writes a
timestamped run directory with evidence, LLM results, and a scored JSON/HTML
report; in optional execute mode it runs a user‑approved plan and includes
execution logs in scoring. JSON outputs are strictly validated with a retry +
json‑fix loop.

2) codex-readiness-integration-test (LLM Codex Readiness Integration Test)

Goal

Validate real agentic execution quality by running Codex CLI against the repo,
executing an approved change prompt, and scoring results with evidence + LLM
evaluation.

How it works

This skill runs an end‑to‑end agentic execution against the repo using Codex
CLI, then executes a build/test plan and scores the run from evidence plus LLM
evaluation. It spins up the Codex session by launching the CLI as a subprocess
with HOME/XDG_CACHE_HOME pointed at the repo‑local .codex-home, using the
approved prompt.json (change prompt + agentic_loop settings) so the CLI reads
AGENTS.md and operates in the repo. It requires a repo‑local login, always
runs in execute mode, and writes results to a timestamped run directory with
agentic logs, LLM results, and a summary.

lukeqin-oai added 2 commits January 21, 2026 10:31

add codex doctor skills

c242b2b

rename skill

eb21206

lukeqin-oai requested a review from a team January 21, 2026 22:00

lukeqin-oai added 2 commits January 21, 2026 15:57

follow skills conventions around directory structure

7023661

modify collect evidence to use different git evaluator

177386d

lukeqin-oai requested review from gverma-openai and xl-openai January 22, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Luke/codex readiness skills #48

Luke/codex readiness skills #48

Uh oh!

lukeqin-oai commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Luke/codex readiness skills #48

Are you sure you want to change the base?

Luke/codex readiness skills #48

Uh oh!

Conversation

lukeqin-oai commented Jan 21, 2026

Overview

1) codex-readiness-unit-test (LLM Codex Readiness Unit Test)

Goal

How it works

2) codex-readiness-integration-test (LLM Codex Readiness Integration Test)

Goal

How it works

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants