Giga rebased final by ArtiomNosov · Pull Request #1 · Pupy101/LiveBench

ArtiomNosov · 2025-10-10T15:43:19Z

No description provided.

* code updates for january update * update gitignore * remove rows with nan values in show_livebench_result * misc script improvements * add eval func for temporal questions * small changes, add llama-3.3 * misc * improve utils for rerunning questions * log error stack traces * properly retrieve aws usage * misc * misc * add script to check coding questions * move r1 to together * update with new livecodebench eval code * improve coding extraction script * misc * misc * remove pyext dependency * fix slight bugs * update to 2025-04-02 * allow judgments without model provided * add correct release date value * add deepseek-v3 to together and qwq-32b * implement web of lies v3 * add resume mode for generate ground truth * fix mistral large and add deepseek llama distill * update error check to properly format results * add qwen 2.5 coder 32b * allow specifying bench names for error checking * update changelog with new updates * don't gen judgments if there are no questions * add models and fix max tokens for r1 * fix perplexity and cohere implementations * filter by model names when error checking * add coding_2 to default benchmarks * remove temporal * add support for displaying token usage info * fix model display name issues

* code updates for january update * update gitignore * remove rows with nan values in show_livebench_result * misc script improvements * add eval func for temporal questions * small changes, add llama-3.3 * misc * improve utils for rerunning questions * log error stack traces * properly retrieve aws usage * misc * misc * add script to check coding questions * move r1 to together * update with new livecodebench eval code * improve coding extraction script * misc * misc * remove pyext dependency * fix slight bugs * update to 2025-04-02 * allow judgments without model provided * add correct release date value * add deepseek-v3 to together and qwq-32b * implement web of lies v3 * add resume mode for generate ground truth * fix mistral large and add deepseek llama distill * update error check to properly format results * add qwen 2.5 coder 32b * allow specifying bench names for error checking * update changelog with new updates * don't gen judgments if there are no questions * add models and fix max tokens for r1 * fix perplexity and cohere implementations * filter by model names when error checking * add coding_2 to default benchmarks * remove temporal * add support for displaying token usage info * rework model configurations to be more flexible and operate independently of provider * add model provider override * update scripts for new model config system * finish updating to new model config system * rework provider url handling to simplify * update readme

…s that were previously incorrect

* switch to using mini-swe-agent for agentic coding * misc fixes * fixes - tracking tokens, supporting responses api * mostly finish replacing swe-agent with mini-swe-agent * more updates to improve functionality * update some models and remove agent configs

- Clean up merge conflict markers in completions.py - Add chat_completion_giga function with proper error handling - Add get_api_function with GigaChat support - Ensure all GigaChat functionality is preserved

- Clean up merge conflict markers in model_adapter.py - Add GigaChat special handling in get_model_adapter function - Ensure GigaChat models are properly recognized

- Add GigaChat model configuration file - Ensure all GigaChat components are properly integrated - Support for GigaChat models in the new architecture

gnguralnick and others added 30 commits February 4, 2025 21:05

qwen2.5-max support

e29270e

add new geminis and fix olympiad parsing

f047f3a

use max_tokens for non-openai api models; properly remove <think> for IF

f3501c7

Update README.md

a485c34

claude 3.7 sonnet support

9d0147c

convert bash run scripts to unified python script

7521e88

add more script options

19ef418

only insert ``` when parsing csv if necessary

a3b6943

refactor to extract params to object

065f2d3

gemini-2.0-flash-lite support

c56700e

pass through debug options

573445f

update readme to explain parallelization better

596810c

gpt 4.5 support

baff165

misc

b85d9a6

misc improvements

3708b03

default use venv

7a68ce4

gemma-3-27b support

b54c562

mistral small and gemma support

f01d4e5

Add perplexity sonar models (LiveBench#169)

fb45525

improve scripts

a27511a

Fix names

7e90e6b

o1 pro support

a8f61ff

Llama4

7173eee

fix reaadme link

ec9c074

grok 3 support

7429e7d

gpt 4.1 models

5a1081b

rename gemini-2.5-pro-exp to preview and reimplement google inference

8c34c03

add llama 4 config

35203f5

arvindsun and others added 30 commits August 17, 2025 08:11

Minimal as well

0351576

deepseek-v3.1 config

f56979a

add supports function calling

4d37ac8

deepseek-v3.1-thinking config

d0e8e2f

add --only-incorrect option to only regenerate judgments for question…

4b54b1f

…s that were previously incorrect

add grok-code-fast-1

662bb50

fix manual api key specification

4cc08d4

another manual api key override fix

6c7a1f7

fix part 3

4651699

add qwen 3 next

7341509

add gemini-2.5-flash-lite

7bf6143

swap default provider for qwen3 next thinking

bcac363

grok-4-fast configs

6902227

misc

7e880fb

use function calling

c31591e

deepseek-v3.1-terminus configs

af07aab

add gpt-5-codex config

717e0b5

qwen3 max config

5e9824d

claude sonnet 4.5 configs

1c4c655

misc fixes

728fdd0

add gpt-5-pro and glm-4.6

563f435

update glm-4.6 to use openrouter

1091b6e

use deepinfra for glm-4.6

7907c29

add exclude question id param

5af0e22

Add giga dependency

1553a62

Add giga

59cc0aa

Fix GigaChat integration after rebase

e428f70

- Clean up merge conflict markers in completions.py - Add chat_completion_giga function with proper error handling - Add get_api_function with GigaChat support - Ensure all GigaChat functionality is preserved

Fix GigaChat model adapter support

5087b28

- Clean up merge conflict markers in model_adapter.py - Add GigaChat special handling in get_model_adapter function - Ensure GigaChat models are properly recognized

Complete GigaChat integration

3a308b9

- Add GigaChat model configuration file - Ensure all GigaChat components are properly integrated - Support for GigaChat models in the new architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Giga rebased final#1

Giga rebased final#1
ArtiomNosov wants to merge 128 commits intoPupy101:gigafrom
IT-Continue:giga-rebased-final

ArtiomNosov commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

Conversation

ArtiomNosov commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants