Skip to content

Comments

Giga rebased final#1

Open
ArtiomNosov wants to merge 128 commits intoPupy101:gigafrom
IT-Continue:giga-rebased-final
Open

Giga rebased final#1
ArtiomNosov wants to merge 128 commits intoPupy101:gigafrom
IT-Continue:giga-rebased-final

Conversation

@ArtiomNosov
Copy link

No description provided.

gnguralnick and others added 30 commits February 4, 2025 21:05
* code updates for january update

* update gitignore

* remove rows with nan values in show_livebench_result

* misc script improvements

* add eval func for temporal questions

* small changes, add llama-3.3

* misc

* improve utils for rerunning questions

* log error stack traces

* properly retrieve aws usage

* misc

* misc

* add script to check coding questions

* move r1 to together

* update with new livecodebench eval code

* improve coding extraction script

* misc

* misc

* remove pyext dependency

* fix slight bugs

* update to 2025-04-02

* allow judgments without model provided

* add correct release date value

* add deepseek-v3 to together and qwq-32b

* implement web of lies v3

* add resume mode for generate ground truth

* fix mistral large and add deepseek llama distill

* update error check to properly format results

* add qwen 2.5 coder 32b

* allow specifying bench names for error checking

* update changelog with new updates

* don't gen judgments if there are no questions

* add models and fix max tokens for r1

* fix perplexity and cohere implementations

* filter by model names when error checking

* add coding_2 to default benchmarks

* remove temporal

* add support for displaying token usage info

* fix model display name issues
* code updates for january update

* update gitignore

* remove rows with nan values in show_livebench_result

* misc script improvements

* add eval func for temporal questions

* small changes, add llama-3.3

* misc

* improve utils for rerunning questions

* log error stack traces

* properly retrieve aws usage

* misc

* misc

* add script to check coding questions

* move r1 to together

* update with new livecodebench eval code

* improve coding extraction script

* misc

* misc

* remove pyext dependency

* fix slight bugs

* update to 2025-04-02

* allow judgments without model provided

* add correct release date value

* add deepseek-v3 to together and qwq-32b

* implement web of lies v3

* add resume mode for generate ground truth

* fix mistral large and add deepseek llama distill

* update error check to properly format results

* add qwen 2.5 coder 32b

* allow specifying bench names for error checking

* update changelog with new updates

* don't gen judgments if there are no questions

* add models and fix max tokens for r1

* fix perplexity and cohere implementations

* filter by model names when error checking

* add coding_2 to default benchmarks

* remove temporal

* add support for displaying token usage info

* rework model configurations to be more flexible and operate independently of provider

* add model provider override

* update scripts for new model config system

* finish updating to new model config system

* rework provider url handling to simplify

* update readme
arvindsun and others added 30 commits August 17, 2025 08:11
* switch to using mini-swe-agent for agentic coding

* misc fixes

* fixes - tracking tokens, supporting responses api

* mostly finish replacing swe-agent with mini-swe-agent

* more updates to improve functionality

* update some models and remove agent configs
- Clean up merge conflict markers in completions.py
- Add chat_completion_giga function with proper error handling
- Add get_api_function with GigaChat support
- Ensure all GigaChat functionality is preserved
- Clean up merge conflict markers in model_adapter.py
- Add GigaChat special handling in get_model_adapter function
- Ensure GigaChat models are properly recognized
- Add GigaChat model configuration file
- Ensure all GigaChat components are properly integrated
- Support for GigaChat models in the new architecture
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants