Skip to content

Conversation

JulienDelavande
Copy link
Member

What does this PR do?

This PR adds energy consumption measurement to each request served by the router.
A new field energy_mj (in millijoules) is added to the details section of the response.

Example usage:

curl 127.0.0.1:3000/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20, "details":true}}' \
    -H 'Content-Type: application/json'

Response (excerpt):

{
  "generated_text": "Deep learning is a subset of machine learning...",
  "details": {
    "finish_reason": "length",
    "generated_tokens": 20,
    "tokens": [...],
    "energy_mj": 108836
  }
}

Motivation

Limitations

  • The measurement is taken at the GPU level during the full processing of the request by the router.
  • Energy is not attributed per user when multiple requests are batched — values represent an approximation of total GPU energy consumption.
  • Despite this, the metric remains useful to give users an idea of the energy footprint of their queries without deploying TGI themselves.

Before submitting


Who can review?

@regisss @Narsil

regisss
regisss previously approved these changes Aug 28, 2025
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants