Skip to content

Conversation

@MkDev11
Copy link
Contributor

@MkDev11 MkDev11 commented Jan 23, 2026

What this does

This adds performance metrics so you can see how long queries take and how many tokens they use. Really helpful when using local models with Ollama to understand performance.

The problem

Right now when you ask Dexter a question, you have no idea:

  • How long it took to think
  • How many tokens were used
  • How fast the model is responding

This makes it hard to optimize your setup, especially with local models.

The solution

After each answer, you'll now see a line like this:

✻ 2s · 1,297 tokens (718.6 tok/s)

This shows:

  • 2s - Total time from question to answer
  • 1,297 tokens - How many tokens were used (input + output)
  • 718.6 tok/s - Throughput (tokens per second)

Why this matters

  • Local model users can see if their setup is fast enough
  • Cost tracking - Know how many tokens you're burning through
  • Performance tuning - Spot slow queries and optimize
  • Model comparison - Compare different models side-by-side

Technical details

  • Tracks timing from start to finish
  • Accumulates tokens across all LLM calls (including tool summaries)
  • Extracts usage from LangChain responses (works with OpenAI, Anthropic, etc.)
  • Only shows stats when token data is available from the provider

Testing

Tested with a real query and got accurate metrics:

❯ What is 2 + 2?

⏺  4

✻ 2s · 1,297 tokens (718.6 tok/s)

Closes #72

- Add TokenUsage interface and extend DoneEvent with totalTime, tokenUsage, tokensPerSecond
- Modify callLlm to return LlmResult with usage metadata extracted from LangChain
- Track start time and accumulate token usage across all LLM calls in agent
- Display performance stats in UI after completion (duration, token count, tok/s)
- Update all callLlm call sites to handle new LlmResult return type

Closes virattt#72
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 23, 2026

@virattt please have a look at the implementation and let me know your feedback. thanks.

Copy link
Owner

@virattt virattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea and thank you for proposing the change.

Can we make the progress view identical to what Claude Code does? The seconds and tokens are shown while CC is working. This would be a great enhancement for Dexter, as well.

We currently only show the seconds taken while Dexter is in the "Answering" state, but it would be nice to have this for the "Thinking" state as well

Addresses PR review feedback to abstract token counting logic
out of agent.ts into a dedicated class.
- Extract token counting into dedicated TokenCounter class (per review)
- Add real-time elapsed timer during processing state (like Claude Code)
- Show progress indicator while agent is thinking/working
@MkDev11
Copy link
Contributor Author

MkDev11 commented Jan 27, 2026

This is a great idea and thank you for proposing the change.

Can we make the progress view identical to what Claude Code does? The seconds and tokens are shown while CC is working. This would be a great enhancement for Dexter, as well.

We currently only show the seconds taken while Dexter is in the "Answering" state, but it would be nice to have this for the "Thinking" state as well

Great! Added a real-time elapsed timer that shows during the Thinking/Tool states - updates every 100ms so users can see progress as it happens. Also extracted token counting into a dedicated TokenCounter class per your other feedback. Let me know if you'd like any adjustments to the display style.

@MkDev11 MkDev11 requested a review from virattt January 27, 2026 15:48
Resolve conflict in agent.ts: keep skill deduplication AND tokenCounter
- Keep TokenCounter for performance metrics
- Add ToolLimitEvent and limit checking from upstream
- Use improved thinking check (skip whitespace-only)
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 5, 2026

@virattt Sorry for tagging you, could you please review the changes once more?

…rch.ts

- Keep TokenCounter for performance metrics tracking
- Integrate upstream's progress channel and context management
- Preserve tokenUsage/tokensPerSecond in done events
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 5, 2026

@virattt I wanted to follow up on my previous message.

virattt

This comment was marked as duplicate.

Copy link
Owner

@virattt virattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this overall!

A few things I noticed:

Bug: financial-metrics.ts and read-filings.ts will break

The callLlm return type changed to { response, usage } but these two files still do await callLlm(...) as AIMessage. That cast won't catch the issue at compile time, but at runtime .tool_calls will be undefined since you're reading it off the wrapper object instead of the actual AIMessage. Both tools will silently fail every time.

Should be a quick fix — just destructure like you already did in financial-search.ts:

const { response } = await callLlm(input.query, { ... });
const aiMessage = response as AIMessage;

tokenCounter in executeToolCalls seems unused

It gets passed in but those methods just invoke tools — they never call callLlm directly. Also worth noting that any LLM calls happening inside tools (like financial-search has its own callLlm call) won't get counted, so the totals will be lower than actual usage. Might be worth a comment or just removing the parameter for now.

Dep bumps

The @langchain/exa jump from 0.1 to 1.0 is a major version bump — any reason to include it here? Might be cleaner as a separate PR so if something breaks it's easy to bisect.

Minor UX things

The old code hid the duration line for fast queries (< 15s). Now it always shows, which means even a quick "what is 2+2" gets a stats line. Also formatDuration rounds to whole seconds, so anything sub-second shows as "0s" which looks a bit odd. Maybe only show it when there's actual token data from the provider, or add a small threshold?

Type casts

There are quite a few response as AIMessage casts — since the type is AIMessage | string, a typeof check would be safer than assuming it's always an AIMessage. Not a big deal but worth cleaning up.


Overall this is a great addition, just needs the two broken files fixed before merging. Looking forward to seeing this land!

Copy link
Owner

@virattt virattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some nits

@virattt
Copy link
Owner

virattt commented Feb 5, 2026

Additionally, the overall counter is right below the user query, which is weird:

❯ Walk me through PLTR's earnings 
  ⏺  6s

Let me pull up Palantir's latest earnings data and recent news.

⏺ Financial Metric("Palantir PLTR income statement last 8 quarters and 
                  annual...")
  ⎿  ⠴ Searching...

⠏ Pondering... (esc to interrupt)

Can we do exactly what Claude Code does instead?

Critical:
- financial-metrics.ts: destructure callLlm to get AIMessage
- read-filings.ts: destructure callLlm in both step1/step2

Medium:
- agent.ts: remove unused tokenCounter from executeToolCalls/executeToolCall

Low/UI:
- HistoryItemView.tsx: show ms for sub-second durations
- HistoryItemView.tsx: remove real-time timer below query (not Claude Code style)
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 5, 2026

Great @virattt - removed it. Now just shows final stats after the answer. Also fixed the callLlm destructuring in both files and cleaned up the unused tokenCounter param. Can you please review the changes again?

@MkDev11 MkDev11 requested a review from virattt February 5, 2026 21:39
@virattt virattt added the run-ci Runs CI label Feb 6, 2026
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 6, 2026

@virattt I am not sure why you ignore me, I found you made some merges except my PRs. Please let me know the reason.

@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 6, 2026

https://gittensor.io/miners/details?githubId=94194147
I know there are a few guys who have submitted PRs on the repo. Review my Gittensor profile on the link. You can review other guys as well.

@virattt
Copy link
Owner

virattt commented Feb 6, 2026

Thanks for the updates. A few more optional suggestions, mostly around type safety and keeping things tidy.

src/model/llm.ts -- type LlmResult.response more narrowly

response: unknown forces as AIMessage casts at every call site (seven of them in agent.ts alone). Since callLlm already knows the shape, we can tighten the return type.

export interface LlmResult {
  response: AIMessage | string;
  usage?: TokenUsage;
}

Then at the bottom of the function:

  if (!outputSchema && !tools && result && typeof result === 'object' && 'content' in result) {
    return { response: (result as { content: string }).content, usage };
  }
  return { response: result as AIMessage, usage };

Why better: the single as AIMessage lives in one place (where we actually know the type), and every downstream consumer gets proper types without casting.

src/model/llm.ts -- safer extractUsage

The current implementation casts nested objects without checking their runtime types, which could silently produce NaN if a provider returns an unexpected shape.

function extractUsage(result: unknown): TokenUsage | undefined {
  if (!result || typeof result !== 'object') return undefined;
  const msg = result as Record<string, unknown>;

  const usageMetadata = msg.usage_metadata;
  if (usageMetadata && typeof usageMetadata === 'object') {
    const u = usageMetadata as Record<string, unknown>;
    const input = typeof u.input_tokens === 'number' ? u.input_tokens : 0;
    const output = typeof u.output_tokens === 'number' ? u.output_tokens : 0;
    const total = typeof u.total_tokens === 'number' ? u.total_tokens : input + output;
    return { inputTokens: input, outputTokens: output, totalTokens: total };
  }

  const responseMetadata = msg.response_metadata;
  if (responseMetadata && typeof responseMetadata === 'object') {
    const rm = responseMetadata as Record<string, unknown>;
    if (rm.usage && typeof rm.usage === 'object') {
      const u = rm.usage as Record<string, unknown>;
      const input = typeof u.prompt_tokens === 'number' ? u.prompt_tokens : 0;
      const output = typeof u.completion_tokens === 'number' ? u.completion_tokens : 0;
      const total = typeof u.total_tokens === 'number' ? u.total_tokens : input + output;
      return { inputTokens: input, outputTokens: output, totalTokens: total };
    }
  }

  return undefined;
}

Why better: guards against NaN propagation when a provider omits a field or returns a string instead of a number. Defensive parsing here saves confusing UI output downstream.

src/components/HistoryItemView.tsx -- consider gating the stats line

Nit. Since duration is now always set via doneEvent.totalTime, the condition item.duration !== undefined || item.tokenUsage is true for every completed query. For a quick "What is 2+2?" with no token data, showing "500ms" on its own is a bit noisy. One option:

{item.status === 'complete' && item.tokenUsage && (
  <Box marginTop={1}>
    <Text color={colors.muted}>
      {'✻ '}
      {item.duration !== undefined && formatDuration(item.duration)}
      {item.duration !== undefined && ' · '}
      {`${item.tokenUsage.totalTokens.toLocaleString()} tokens`}
      {item.tokensPerSecond !== undefined && ` (${item.tokensPerSecond.toFixed(1)} tok/s)`}
    </Text>
  </Box>
)}

Why better: the stats line only appears when there is meaningful token data to show, which is the interesting part. Duration alone is less useful without the token context.

src/components/HistoryItemView.tsx -- optional: extract a helper for the stats string

Nit. The inline conditionals for building the stats text are a little dense. A small helper keeps the JSX focused on layout.

function formatPerformanceStats(
  duration?: number,
  tokenUsage?: TokenUsage,
  tokensPerSecond?: number
): string {
  const parts: string[] = [];
  if (duration !== undefined) parts.push(formatDuration(duration));
  if (tokenUsage) parts.push(`${tokenUsage.totalTokens.toLocaleString()} tokens`);
  if (tokensPerSecond !== undefined) parts.push(`(${tokensPerSecond.toFixed(1)} tok/s)`);
  return parts.join(' · ');
}

Then in the JSX:

<Text color={colors.muted}>{formatPerformanceStats(item.duration, item.tokenUsage, item.tokensPerSecond)}</Text>

Why better: easier to read and test independently. Pure formatting logic stays out of the component tree.

Summary

Area Suggestion Impact
LlmResult.response type AIMessage | string instead of unknown Eliminates 7 as AIMessage casts
extractUsage Runtime type checks on nested fields Prevents silent NaN from unexpected shapes
Stats display gate Only show when tokenUsage is present Avoids noisy "500ms" line on trivial queries
Stats helper Extract formatPerformanceStats() Readability, testability

- Keep TokenUsage tracking for performance metrics
- Add Anthropic cache_control for ~90% input token savings
- Type LlmResult.response as AIMessage | string (eliminates 7 casts)
- Add runtime type checks in extractUsage (prevents NaN from unexpected shapes)
- Gate stats display on tokenUsage presence (avoids noisy duration-only line)
- Extract formatPerformanceStats helper for readability
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 6, 2026

@virattt Thanks for the suggestions! Applied all the issues.

@virattt
Copy link
Owner

virattt commented Feb 6, 2026

Thanks!

@virattt virattt merged commit b3142ae into virattt:main Feb 6, 2026
2 checks passed
@MkDev11
Copy link
Contributor Author

MkDev11 commented Feb 6, 2026

Thanks!

Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-ci Runs CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement Request

2 participants