Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix random answer bugs #2

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DragonFive
Copy link

This pr fixes two bugs:

  1. The index2ans code has a bug that different rows share the last row's index2ans. So the response answer and correct answer can't find a proper choise, which will result in random answer, as the code in parse_multi_choice_response says.
      if len(candidates) == 0:  # still not get answer, randomly choose one.
          if default_answer is None:
               pred_index = random.choice(all_choices)
  1. The answer parsing code has a bug that it will change strip the correct answer's preriod punctuation at the end, so the answer will be random because following cde doesn't work.
    for index, ans in index2ans.items():
           print(f"{index}: {ans.lower()}, {response.lower()}")
           if ans.lower() in response.lower():
                candidates.append(index)
                index_ans = False

for example when my first row is:

args_list[0]:('question': 'What are the men doing?',
'audio_path': './OmniBench/mm_data/audio/2_009_four_people.mp3',
'image_path': './OmniBench/mm_data/image/2_009_four_people.png',
'index': 0,
'answer': 'The man in jeans is playing a crossword puzzle.',
'options': ['The man in jeans is taking notes from the newspaper.', 'The man in purple is reading the newspaper.', 'The man in jeans is playing a crossword puzzle.', 'The man on the table is doing a crossword puzzle.']}, 0, input_file='./OmniBench/dataset/batch-5_1142_20240817.jsonl',

),
{'A': 'The man in jeans is taking notes from the newspaper.', 'B': 'The man in purple is reading the newspaper.', 'C': 'The man in jeans is playing a crossword puzzle.', 'D': 'The man on the table is doing a crossword puzzle.'},
['A', 'B', 'C', 'D'])

because of bug1, the index2ans for all rows will be

index2ans:{'A': 'Yes, both feature the same music, although the audio is a comedic rendition.', 'B': 'No, the audio is a different composition entirely.', 'C': 'Yes, but the audio is from a different section of the piece.', 'D': 'No, the audio is a similar but distinctly different Beethoven symphony.'}

after fix bug1, the index2ans for this row will be

index2ans:{'A': 'The man in jeans is taking notes from the newspaper.', 'B': 'The man in purple is reading the newspaper.', 'C': 'The man in jeans is playing a crossword puzzle.', 'D': 'The man on the table is doing a crossword puzzle.'}

because of bug2,

the correct answer will be 'The man in jeans is playing a crossword puzzle', which can not be identified because it has no dot at the end, the real answer is 'The man in jeans is playing a crossword puzzle.'

the index2ans is same for different row, this commit fix tie
fix correct_answer random
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant