Skip to content

Commit

Permalink
Merge pull request #324 from symflower/v0.6.1-run
Browse files Browse the repository at this point in the history
v0.6 results
  • Loading branch information
bauersimon authored Sep 9, 2024
2 parents 197ece8 + 3a71848 commit bbf6ab9
Show file tree
Hide file tree
Showing 20 changed files with 8,124 additions and 201 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ go.work
.envrc

# Ignore auto-generated evaluation folders.
evaluation-*
evaluation-*/
2 changes: 1 addition & 1 deletion docs/kubernetes/volume.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ spec:
- ReadWriteMany # Ensure that the access mode is "ReadWriteMany".
resources:
requests:
storage: 5Gi
storage: 50Gi
3 changes: 3 additions & 0 deletions docs/reports/v0.6/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This evaluation ran over several days and is actually assembled from several sub-evaluations. Hence, we only provide the `csv` results for now because the logs are distributed all over.

The benchmark consisted of `5x` runs, **except** for the `code-repair` task, which contains only one usable run because of [a bug that is already fixed for later versions](https://github.com/symflower/eval-dev-quality/commit/19011da21c3196e346d363148bf83989dc1f2a88).
68 changes: 68 additions & 0 deletions docs/reports/v0.6/evaluation-by-language-score.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
model_id,score,golang_score,java_score,ruby_score
openrouter/01-ai/yi-34b-chat,273.0,9.0,264.0,0.0
openrouter/ai21/jamba-instruct,15606.0,15.0,7856.0,7735.0
openrouter/alpindale/goliath-120b,2429.0,1265.0,13.0,1151.0
openrouter/alpindale/magnum-72b,18583.0,780.0,9915.0,7888.0
openrouter/anthropic/claude-3-haiku,34021.0,5137.0,16092.0,12792.0
openrouter/anthropic/claude-3-opus,38485.0,7432.0,17939.0,13114.0
openrouter/anthropic/claude-3-sonnet,29425.0,6419.0,12187.0,10819.0
openrouter/anthropic/claude-3.5-sonnet,39540.0,7590.0,17948.0,14002.0
openrouter/austism/chronos-hermes-13b,13583.0,3963.0,5421.0,4199.0
openrouter/cognitivecomputations/dolphin-llama-3-70b,25969.0,4535.0,11786.0,9648.0
openrouter/cognitivecomputations/dolphin-mixtral-8x22b,23843.0,4712.0,12113.0,7018.0
openrouter/cognitivecomputations/dolphin-mixtral-8x7b,11602.0,1184.0,6698.0,3720.0
openrouter/cohere/command-r-03-2024,14938.0,2139.0,8462.0,4337.0
openrouter/cohere/command-r-08-2024,21549.0,6374.0,11442.0,3733.0
openrouter/cohere/command-r-plus-04-2024,26797.0,3450.0,14993.0,8354.0
openrouter/cohere/command-r-plus-08-2024,26312.0,3378.0,15137.0,7797.0
openrouter/databricks/dbrx-instruct,14863.0,5819.0,9044.0,0.0
openrouter/deepseek/deepseek-chat,38715.0,7302.0,17929.0,13484.0
openrouter/deepseek/deepseek-coder,38263.0,7157.0,17166.0,13940.0
openrouter/google/gemini-flash-1.5,25434.0,7245.0,4846.0,13343.0
openrouter/google/gemini-pro-1.5,20955.0,7591.0,7161.0,6203.0
openrouter/google/gemma-2-27b-it,24166.0,6514.0,13982.0,3670.0
openrouter/google/gemma-2-9b-it,10019.0,1662.0,8346.0,11.0
openrouter/google/palm-2-chat-bison,10545.0,4044.0,2364.0,4137.0
openrouter/google/palm-2-codechat-bison,9463.0,4185.0,15.0,5263.0
openrouter/meta-llama/llama-3-70b-instruct,32865.0,6641.0,14310.0,11914.0
openrouter/meta-llama/llama-3-8b-instruct,9861.0,14.0,6225.0,3622.0
openrouter/meta-llama/llama-3.1-405b-instruct,37535.0,7705.0,17080.0,12750.0
openrouter/meta-llama/llama-3.1-70b-instruct,34328.0,6454.0,14852.0,13022.0
openrouter/meta-llama/llama-3.1-8b-instruct,24111.0,4329.0,11618.0,8164.0
openrouter/microsoft/phi-3-medium-128k-instruct,11145.0,1270.0,2009.0,7866.0
openrouter/microsoft/phi-3-medium-4k-instruct,14576.0,4370.0,1527.0,8679.0
openrouter/microsoft/phi-3-mini-128k-instruct,16317.0,1093.0,9579.0,5645.0
openrouter/microsoft/wizardlm-2-7b,14293.0,929.0,9735.0,3629.0
openrouter/microsoft/wizardlm-2-8x22b,27433.0,4909.0,16421.0,6103.0
openrouter/mistralai/codestral-mamba,24507.0,6287.0,8979.0,9241.0
openrouter/mistralai/mistral-7b-instruct-v0.3,13879.0,2789.0,8137.0,2953.0
openrouter/mistralai/mistral-large,36056.0,6899.0,17477.0,11680.0
openrouter/mistralai/mistral-medium,30464.0,5287.0,15110.0,10067.0
openrouter/mistralai/mistral-nemo,19162.0,1423.0,8615.0,9124.0
openrouter/mistralai/mistral-small,20218.0,4505.0,15698.0,15.0
openrouter/mistralai/mistral-tiny,8643.0,2407.0,6226.0,10.0
openrouter/mistralai/mixtral-8x22b-instruct,29308.0,5941.0,15600.0,7767.0
openrouter/mistralai/mixtral-8x7b-instruct,28497.0,4446.0,15072.0,8979.0
openrouter/neversleep/llama-3-lumimaid-70b,18596.0,4050.0,8705.0,5841.0
openrouter/neversleep/llama-3-lumimaid-8b,10618.0,13.0,6499.0,4106.0
openrouter/nousresearch/hermes-2-pro-llama-3-8b,16984.0,14.0,9064.0,7906.0
openrouter/nousresearch/hermes-2-theta-llama-3-8b,17210.0,11.0,10235.0,6964.0
openrouter/nousresearch/nous-hermes-2-mixtral-8x7b-dpo,19990.0,2183.0,11647.0,6160.0
openrouter/nousresearch/nous-hermes-yi-34b,8206.0,3274.0,4919.0,13.0
openrouter/openai/gpt-4-turbo,38422.0,7213.0,18170.0,13039.0
openrouter/openai/gpt-4o,39005.0,6656.0,17900.0,14449.0
openrouter/openai/gpt-4o-mini,39441.0,7182.0,17859.0,14400.0
openrouter/openchat/openchat-8b,5376.0,3198.0,2167.0,11.0
openrouter/perplexity/llama-3-sonar-large-32k-chat,35051.0,6853.0,15929.0,12269.0
openrouter/perplexity/llama-3-sonar-small-32k-chat,7338.0,977.0,2167.0,4194.0
openrouter/qwen/qwen-110b-chat,10980.0,4450.0,12.0,6518.0
openrouter/qwen/qwen-2-72b-instruct,25290.0,15.0,14465.0,10810.0
openrouter/qwen/qwen-2-7b-instruct,10750.0,15.0,6941.0,3794.0
openrouter/qwen/qwen-72b-chat,27717.0,4033.0,13615.0,10069.0
openrouter/recursal/eagle-7b,0.0,0.0,0.0,0.0
openrouter/recursal/rwkv-5-3b-ai-town,0.0,0.0,0.0,0.0
openrouter/rwkv/rwkv-5-world-3b,20.0,10.0,10.0,0.0
openrouter/teknium/openhermes-2.5-mistral-7b,15881.0,15.0,10830.0,5036.0
openrouter/togethercomputer/stripedhyena-nous-7b,12001.0,3487.0,5439.0,3075.0
openrouter/undi95/toppy-m-7b,3026.0,2035.0,11.0,980.0
openrouter/xwin-lm/xwin-lm-70b,5596.0,2589.0,2998.0,9.0
Loading

0 comments on commit bbf6ab9

Please sign in to comment.