Skip to content

Remove double LLM call in --simple mode to reduce cost and latency #120

@calchiwo

Description

@calchiwo

The current --simple mode flow:

  1. Generate full explanation via LLM

  2. Build a new prompt from that output

  3. Call the LLM again to summarize

This creates two problems:

  1. Doubles latency

  2. Doubles token usage and cost

As repository size grows, the first output may become large. Feeding that entire output back into the model increases:

  • Token consumption

  • Risk of hitting context limits

  • Hallucination amplification

This is unnecessary for simple mode.

--simple should not depend on generating the full explanation first.

Proposed change:

When --simple is set, build a simplified prompt directly from:

  • Repo metadata

  • README

  • Optional tree summary

Then call the LLM once.

Benefits:

  • Single LLM call

  • Lower cost

  • Lower latency

  • Reduced token overflow risk

  • Cleaner execution path

This keeps --simple lightweight, which aligns with user expectations.

Right now it behaves like “generate detailed, then compress,” which is inefficient by design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions