Is possible to retrieve tokens from OpenAI? #115

JesusMF23 · 2023-09-11T16:33:58Z

JesusMF23
Sep 11, 2023

Hi guys, I am building a product that uses OpenAI API. For the usage & consumption information for user I need to retrieve the input and output tokens from OpenAI API response.

Currently I have a microservice built in node and I retrieve the information from usage from OpenAI API response (below response from documentation). I want to implement guardrails in other service that I will call from my node.js one just for the completion but I need this info, is there any way I can retrieve this information from guardrails?

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo-0613",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Answered by trebedea

Sep 13, 2023

Hi @JesusMF23 ,

We have LLM stats for each call, see here how it is used:

NeMo-Guardrails/nemoguardrails/rails/llm/llmrails.py

Line 368 in cb07be6

log.info("--- :: Stats: %s" % llm_stats)

At this moment, we reset the stats at the beginning of each generate_async call:

NeMo-Guardrails/nemoguardrails/rails/llm/llmrails.py

Line 336 in cb07be6

llm_stats.reset()

But you can easily get the stats after each call and monitor the total usage for an app. Is this what you are looking for?

View full answer

trebedea · 2023-09-13T15:45:45Z

trebedea
Sep 13, 2023
Collaborator

Hi @JesusMF23 ,

We have LLM stats for each call, see here how it is used:

NeMo-Guardrails/nemoguardrails/rails/llm/llmrails.py

Line 368 in cb07be6

log.info("--- :: Stats: %s" % llm_stats)

At this moment, we reset the stats at the beginning of each generate_async call:

NeMo-Guardrails/nemoguardrails/rails/llm/llmrails.py

Line 336 in cb07be6

llm_stats.reset()

But you can easily get the stats after each call and monitor the total usage for an app. Is this what you are looking for?

4 replies

JesusMF23 Sep 14, 2023
Author

Hi @trebedea! thanks for sharing! I just solved the problem by doing the following:

from nemoguardrails.logging.stats import llm_stats

afterwards you can access (print, save it or whatever) within the function like this:

app = LLMRails(config, llm=llm, verbose=True)

async def generate_message():
    new_message = await app.generate_async(messages=[{
        "role": "user",
        "content": "<content>"
    }])
    print(llm_stats)
    return new_message

However, I have now couple of questions to understand the behavior.

From stats I am getting:
2 total calls, 7.243645906448364 total time, 3409 total tokens, 3208 total prompt tokens, 201 total completion tokens

My questions are:

why is it generating 2 calls if I am doing it just once?
is there anyway that I can check the prompt that has been used? consuming over 3k tokens is too much compared to my basic prompt using GPT API, I assume that is because of the colang setup but it would be great to know what is being sent.

trebedea Sep 14, 2023
Collaborator

Great, that was my suggestion as well. You can also keep a total if you want, instead of just printing the stats after each generate_async.

I think the architecture readme should answer both questions.

TLDR; we are using a three step approach to generate the bot message: user message --> user intent --> next step in a flow (most time this is a bot intent) --> bot message , here --> can mean a LLM call or taking a decision using the user-defined flow.
In all cases, Guardrails needs at least 1 LLM call for getting the user intent. The bot message generation might also require one call if you don't have bot canonical forms manually defined in your Colang files. The next step in a flow also requires an extra LLM call sometimes, when a predefined flow is not matched.
And each call uses a longer prompt that contains few-shots for each step (to be able to generalize).

Right now, we are working at having a single call doing all three steps. We hope to have the first version of this in the next release and improve on it later.

You should be able to see the prompts if you run use verbose=True for the LLMRails.

JesusMF23 Sep 16, 2023
Author

Thanks @trebedea, can you please point me to where can I find to print the logs with verbose=True ? I have been trying to use:

from nemoguardrails.logging.verbose import VerboseHandler

but I don't know how to setup when using it, should I use emit? nothing? ... if you know where can I find an example, that would be great

drazvan Sep 19, 2023
Maintainer

I'm assuming you're asking how to activate the verbose mode. You can check how the CLI sets the verbose mode to True: https://github.com/NVIDIA/NeMo-Guardrails/blob/main/nemoguardrails/cli/__init__.py#L49. The logs will be printed on stdout. Does this help?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is possible to retrieve tokens from OpenAI? #115

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Is possible to retrieve tokens from OpenAI? #115

JesusMF23 Sep 11, 2023

Replies: 1 comment · 4 replies

trebedea Sep 13, 2023 Collaborator

JesusMF23 Sep 14, 2023 Author

trebedea Sep 14, 2023 Collaborator

JesusMF23 Sep 16, 2023 Author

drazvan Sep 19, 2023 Maintainer

JesusMF23
Sep 11, 2023

Replies: 1 comment 4 replies

trebedea
Sep 13, 2023
Collaborator

JesusMF23 Sep 14, 2023
Author

trebedea Sep 14, 2023
Collaborator

JesusMF23 Sep 16, 2023
Author

drazvan Sep 19, 2023
Maintainer