add usage statistics for inference API #894

dineshyv · 2025-01-28T23:32:51Z

What does this PR do?

Adds a new usage field to inference APIs to indicate token counts for prompt and completion.

vladimirivic · 2025-01-29T07:14:12Z

@dineshyv I think you also want to regenerate the Open API spec so this get's updated:
https://github.com/meta-llama/llama-stack/blob/main/docs/resources/llama-stack-spec.yaml#L1965-L1988

I think you just need to run docs/openapi_generator/run_openapi_generator.sh

vladimirivic

LGTM, approving to unblock, see my comment about the spec update.

ashwinb · 2025-01-29T13:43:19Z

@raghotham any comments on generality / future-facing stuff?

@dineshyv hold on for a couple hours before merging.

ashwinb

This needs more discussion. Here is an alternate idea:

we think of this as an extension of our telemetry API -- or rather telemetry needs to just work for this too.
from that standpoint, we can make sure these stats get logged as "metrics" into our telemetry API and therefore can be queried
this of course works only if the distro has a telemetry provider

however this is not sufficient because it makes for a terrible dev-ex for debugging while developing. when I want to understand usage I want each API calls usage to also be available. so how about all Llama Stack API calls are augmented by metrics?

dineshyv · 2025-01-30T15:58:06Z

@ashwinb Agree that we should send these metrics to telemetry as well. But retrieving/querying them would cause us to be dependent on a specific telemetry sink that supports metrics retrieval.
Since we depend on OTEL to export the telemetry data, we can be agnostic of the actual sink and give the users flexibility to configure different sinks as long as they are able to process OTEL data.
Since OTEL does not support retrieval, we will have to depend on a specific sink.
For traces and spans, we maintain an independent sqlite store for supporting retrieval.
For metrics, I see two options:

Have an explicit dependency on third party sink like jaeger which supports metrics retrieval.
Implement our own metrics retrieval in the sqllite sink

(1) is not great since it causes us to have an explicit dependency on a third party sink, while (2) sounds like reimplementing a whole metrics engine.
My preference would be to go for (1). But if we go with (1), since we are already dependent on the third party sink and will need to use their API, i don't see the value in us supporting an API which calls the sink's API since developers can directly call into the Sink's API.

Given this context, I think we should do the following:

Only support metrics export through OTEL. We give the developer flexibility to choose the sink and they can use the sink specific API to query. I will work on this in a subsequent PR.
I do see value in returning the usage statistics as part of the response and for that purpose, we keep this PR as is. We will not depend on telemetry API. I think its not such bad thing since the inference provider is the source of truth for this and we just return it as part of the response

Thoughts? Do you think there is anything I am missing?

dineshyv requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, vladimirivic and sixianyi0721 as code owners January 28, 2025 23:32

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2025

dineshyv changed the title ~~add usage statistics for chat completion non streaming~~ add usage statistics for inference API Jan 28, 2025

add usage statistics for inference API

6609362

dineshyv force-pushed the usage-statistics branch from ead6cb9 to 6609362 Compare January 29, 2025 01:34

vladimirivic approved these changes Jan 29, 2025

View reviewed changes

ashwinb requested changes Jan 29, 2025

View reviewed changes

add metrics query to telemtry

0264662

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add usage statistics for inference API #894

add usage statistics for inference API #894

dineshyv commented Jan 28, 2025

vladimirivic commented Jan 29, 2025

vladimirivic left a comment

ashwinb commented Jan 29, 2025

ashwinb left a comment

dineshyv commented Jan 30, 2025

add usage statistics for inference API #894

Are you sure you want to change the base?

add usage statistics for inference API #894

Conversation

dineshyv commented Jan 28, 2025

What does this PR do?

vladimirivic commented Jan 29, 2025

vladimirivic left a comment

Choose a reason for hiding this comment

ashwinb commented Jan 29, 2025

ashwinb left a comment

Choose a reason for hiding this comment

dineshyv commented Jan 30, 2025