Open
Conversation
* code updates for january update * update gitignore * remove rows with nan values in show_livebench_result * misc script improvements * add eval func for temporal questions * small changes, add llama-3.3 * misc * improve utils for rerunning questions * log error stack traces * properly retrieve aws usage * misc * misc * add script to check coding questions * move r1 to together * update with new livecodebench eval code * improve coding extraction script * misc * misc * remove pyext dependency * fix slight bugs * update to 2025-04-02 * allow judgments without model provided * add correct release date value * add deepseek-v3 to together and qwq-32b * implement web of lies v3 * add resume mode for generate ground truth * fix mistral large and add deepseek llama distill * update error check to properly format results * add qwen 2.5 coder 32b * allow specifying bench names for error checking * update changelog with new updates * don't gen judgments if there are no questions * add models and fix max tokens for r1 * fix perplexity and cohere implementations * filter by model names when error checking * add coding_2 to default benchmarks * remove temporal * add support for displaying token usage info * fix model display name issues
* code updates for january update * update gitignore * remove rows with nan values in show_livebench_result * misc script improvements * add eval func for temporal questions * small changes, add llama-3.3 * misc * improve utils for rerunning questions * log error stack traces * properly retrieve aws usage * misc * misc * add script to check coding questions * move r1 to together * update with new livecodebench eval code * improve coding extraction script * misc * misc * remove pyext dependency * fix slight bugs * update to 2025-04-02 * allow judgments without model provided * add correct release date value * add deepseek-v3 to together and qwq-32b * implement web of lies v3 * add resume mode for generate ground truth * fix mistral large and add deepseek llama distill * update error check to properly format results * add qwen 2.5 coder 32b * allow specifying bench names for error checking * update changelog with new updates * don't gen judgments if there are no questions * add models and fix max tokens for r1 * fix perplexity and cohere implementations * filter by model names when error checking * add coding_2 to default benchmarks * remove temporal * add support for displaying token usage info * rework model configurations to be more flexible and operate independently of provider * add model provider override * update scripts for new model config system * finish updating to new model config system * rework provider url handling to simplify * update readme
…s that were previously incorrect
* switch to using mini-swe-agent for agentic coding * misc fixes * fixes - tracking tokens, supporting responses api * mostly finish replacing swe-agent with mini-swe-agent * more updates to improve functionality * update some models and remove agent configs
- Clean up merge conflict markers in completions.py - Add chat_completion_giga function with proper error handling - Add get_api_function with GigaChat support - Ensure all GigaChat functionality is preserved
- Clean up merge conflict markers in model_adapter.py - Add GigaChat special handling in get_model_adapter function - Ensure GigaChat models are properly recognized
- Add GigaChat model configuration file - Ensure all GigaChat components are properly integrated - Support for GigaChat models in the new architecture
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.