You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @mmaaz60,
Thanks for your great work and open sourcing!
I am trying to evaluate PALO-7B (loaded from transformers) on the multilingual-llava-in-the-wild, but I find the performance is much lower than the reported numbers. Here are the results I got:
Is there a significant discrepancy between the content I generated and yours, or there are issues in evaluation? Do you have any idea about this, or share the generated result files with me?
The text was updated successfully, but these errors were encountered:
By the way, I find the path of some language in the evaluation scripts contains corrected, is there an updated version of the multilingual-llava-in-the-wild benchmark?
Hi @mmaaz60,
Thanks for your great work and open sourcing!
I am trying to evaluate PALO-7B (loaded from transformers) on the multilingual-llava-in-the-wild, but I find the performance is much lower than the reported numbers. Here are the results I got:
Here are the generated content files:
PALO-7B_English_content.json
PALO-7B_Chinese_content.json
Here are the evaluation files with scores:
PALO-7B_English.json
PALO-7B_Chinese.json
Summaries produced by palo/eval/summarize_gpt_review.py
Is there a significant discrepancy between the content I generated and yours, or there are issues in evaluation? Do you have any idea about this, or share the generated result files with me?
The text was updated successfully, but these errors were encountered: