Remove double LLM call in `--simple` mode to reduce cost and latency

The current `--simple` mode flow:

1. Generate full explanation via LLM

2. Build a new prompt from that output

3. Call the LLM again to summarize



This creates two problems:

1. Doubles latency


2. Doubles token usage and cost



As repository size grows, the first output may become large. Feeding that entire output back into the model increases:

* Token consumption

* Risk of hitting context limits

* Hallucination amplification


This is unnecessary for simple mode.

`--simple` should not depend on generating the full explanation first.

Proposed change:

When `--simple` is set, build a simplified prompt directly from:

* Repo metadata

* README

* Optional tree summary


Then call the LLM once.

Benefits:

* Single LLM call

* Lower cost

* Lower latency

* Reduced token overflow risk

* Cleaner execution path


This keeps `--simple` lightweight, which aligns with user expectations.

Right now it behaves like “generate detailed, then compress,” which is inefficient by design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove double LLM call in `--simple` mode to reduce cost and latency #120

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Remove double LLM call in --simple mode to reduce cost and latency #120

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Remove double LLM call in `--simple` mode to reduce cost and latency #120