Skip to content

docs: Update README.md #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,24 +216,24 @@ curl localhost:8002/metrics
VLLM stats are reported by the metrics endpoint in fields that are prefixed with
`vllm:`. Triton currently supports reporting of the following metrics from vLLM.
```bash
# Number of prefill tokens processed.
counter_prompt_tokens
# Number of generation tokens processed.
counter_generation_tokens
# Counter of prefill tokens processed.
Copy link
Contributor

@rmccorm4 rmccorm4 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both this section and the one right below it? Why not just use the below section instead?

VLLM stats are reported by the metrics endpoint in fields that are prefixed with
vllm:. For example, the metrics reported by Triton will look similar to the following:

# HELP vllm:prompt_tokens_total Number of prefill tokens processed.
# TYPE vllm:prompt_tokens_total counter
vllm:prompt_tokens_total{model="vllm_model",version="1"} 10
# HELP vllm:generation_tokens_total Number of generation tokens processed.
# TYPE vllm:generation_tokens_total counter
vllm:generation_tokens_total{model="vllm_model",version="1"} 16
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have another section listing all supported metrics. The sample metrics output is just hard to follow IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for example, changing counter_prompt_tokens to vllm:prompt_tokens_total allows reader to easily locate the corresponding output from below section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oandreeva-nv do you mind driving this PR if you have time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we create a dedicated doc for metrics? So that our front facing README is nice and non-cluttered ?

vllm:prompt_tokens_total
# Counter of generation tokens processed.
vllm:generation_tokens_total
# Histogram of time to first token in seconds.
histogram_time_to_first_token
vllm:time_to_first_token_seconds
# Histogram of time per output token in seconds.
histogram_time_per_output_token
vllm:time_per_output_token_seconds
# Histogram of end to end request latency in seconds.
histogram_e2e_time_request
# Number of prefill tokens processed.
histogram_num_prompt_tokens_request
# Number of generation tokens processed.
histogram_num_generation_tokens_request
vllm:e2e_request_latency_seconds
# Histogram of prefill tokens processed.
vllm:request_prompt_tokens
# Histogram of generation tokens processed.
vllm:request_generation_tokens
# Histogram of the best_of request parameter.
histogram_best_of_request
vllm:request_params_best_of
# Histogram of the n request parameter.
histogram_n_request
vllm:request_params_n
```
Your output for these fields should look similar to the following:
```bash
Expand Down
Loading