Skip to content

Conversation

@lukemarsden
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented May 6, 2025

Helix Test Summary

Test Name Result Reason Model Inference Time Evaluation Time Session Link Debug Link
- check usd to gbp rate PASS The response includes the USD to GBP exchange rate... meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 45.365s 715ms Session Debug
- usdgbp FAIL The response mentions the exchange rate for GBP to... meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 10.056s 834ms Session Debug

Total execution time: 46.08s
Overall result: FAIL

@github-actions
Copy link

github-actions bot commented May 6, 2025

Helix Test Summary

Test Name Result Reason Model Inference Time Evaluation Time Session Link Debug Link
- jokes must be funny FAIL The response does not contain a joke, let alone a ... meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 3.102s 666ms Session Debug
- tells jokes FAIL The response does not contain a joke as requested. meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 2.565s 679ms Session Debug

Total execution time: 3.768s
Overall result: FAIL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants