-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: add performance metrics display (timer, tokens, tok/s) #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add TokenUsage interface and extend DoneEvent with totalTime, tokenUsage, tokensPerSecond - Modify callLlm to return LlmResult with usage metadata extracted from LangChain - Track start time and accumulate token usage across all LLM calls in agent - Display performance stats in UI after completion (duration, token count, tok/s) - Update all callLlm call sites to handle new LlmResult return type Closes virattt#72
|
@virattt please have a look at the implementation and let me know your feedback. thanks. |
virattt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea and thank you for proposing the change.
Can we make the progress view identical to what Claude Code does? The seconds and tokens are shown while CC is working. This would be a great enhancement for Dexter, as well.
We currently only show the seconds taken while Dexter is in the "Answering" state, but it would be nice to have this for the "Thinking" state as well
Addresses PR review feedback to abstract token counting logic out of agent.ts into a dedicated class.
- Extract token counting into dedicated TokenCounter class (per review) - Add real-time elapsed timer during processing state (like Claude Code) - Show progress indicator while agent is thinking/working
Great! Added a real-time elapsed timer that shows during the Thinking/Tool states - updates every 100ms so users can see progress as it happens. Also extracted token counting into a dedicated |
Resolve conflict in agent.ts: keep skill deduplication AND tokenCounter
- Keep TokenCounter for performance metrics - Add ToolLimitEvent and limit checking from upstream - Use improved thinking check (skip whitespace-only)
|
@virattt Sorry for tagging you, could you please review the changes once more? |
…rch.ts - Keep TokenCounter for performance metrics tracking - Integrate upstream's progress channel and context management - Preserve tokenUsage/tokensPerSecond in done events
|
@virattt I wanted to follow up on my previous message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this overall!
A few things I noticed:
Bug: financial-metrics.ts and read-filings.ts will break
The callLlm return type changed to { response, usage } but these two files still do await callLlm(...) as AIMessage. That cast won't catch the issue at compile time, but at runtime .tool_calls will be undefined since you're reading it off the wrapper object instead of the actual AIMessage. Both tools will silently fail every time.
Should be a quick fix — just destructure like you already did in financial-search.ts:
const { response } = await callLlm(input.query, { ... });
const aiMessage = response as AIMessage;
tokenCounter in executeToolCalls seems unused
It gets passed in but those methods just invoke tools — they never call callLlm directly. Also worth noting that any LLM calls happening inside tools (like financial-search has its own callLlm call) won't get counted, so the totals will be lower than actual usage. Might be worth a comment or just removing the parameter for now.
Dep bumps
The @langchain/exa jump from 0.1 to 1.0 is a major version bump — any reason to include it here? Might be cleaner as a separate PR so if something breaks it's easy to bisect.
Minor UX things
The old code hid the duration line for fast queries (< 15s). Now it always shows, which means even a quick "what is 2+2" gets a stats line. Also formatDuration rounds to whole seconds, so anything sub-second shows as "0s" which looks a bit odd. Maybe only show it when there's actual token data from the provider, or add a small threshold?
Type casts
There are quite a few response as AIMessage casts — since the type is AIMessage | string, a typeof check would be safer than assuming it's always an AIMessage. Not a big deal but worth cleaning up.
Overall this is a great addition, just needs the two broken files fixed before merging. Looking forward to seeing this land!
virattt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some nits
|
Additionally, the overall counter is right below the user query, which is weird: Can we do exactly what Claude Code does instead? |
Critical: - financial-metrics.ts: destructure callLlm to get AIMessage - read-filings.ts: destructure callLlm in both step1/step2 Medium: - agent.ts: remove unused tokenCounter from executeToolCalls/executeToolCall Low/UI: - HistoryItemView.tsx: show ms for sub-second durations - HistoryItemView.tsx: remove real-time timer below query (not Claude Code style)
|
Great @virattt - removed it. Now just shows final stats after the answer. Also fixed the callLlm destructuring in both files and cleaned up the unused tokenCounter param. Can you please review the changes again? |
|
@virattt I am not sure why you ignore me, I found you made some merges except my PRs. Please let me know the reason. |
|
https://gittensor.io/miners/details?githubId=94194147 |
|
Thanks for the updates. A few more optional suggestions, mostly around type safety and keeping things tidy.
|
| Area | Suggestion | Impact |
|---|---|---|
LlmResult.response type |
AIMessage | string instead of unknown |
Eliminates 7 as AIMessage casts |
extractUsage |
Runtime type checks on nested fields | Prevents silent NaN from unexpected shapes |
| Stats display gate | Only show when tokenUsage is present |
Avoids noisy "500ms" line on trivial queries |
| Stats helper | Extract formatPerformanceStats() |
Readability, testability |
- Keep TokenUsage tracking for performance metrics - Add Anthropic cache_control for ~90% input token savings
- Type LlmResult.response as AIMessage | string (eliminates 7 casts) - Add runtime type checks in extractUsage (prevents NaN from unexpected shapes) - Gate stats display on tokenUsage presence (avoids noisy duration-only line) - Extract formatPerformanceStats helper for readability
|
@virattt Thanks for the suggestions! Applied all the issues. |
|
Thanks! |
Appreciate it! |
What this does
This adds performance metrics so you can see how long queries take and how many tokens they use. Really helpful when using local models with Ollama to understand performance.
The problem
Right now when you ask Dexter a question, you have no idea:
This makes it hard to optimize your setup, especially with local models.
The solution
After each answer, you'll now see a line like this:
This shows:
Why this matters
Technical details
Testing
Tested with a real query and got accurate metrics:
Closes #72