We conducted our survey on Amazon Mechanical Turk sandbox.
This is the first part of the survey where we are the annotator to select an interpretation for the indirect answers given a Circa context and conversation (in the relaxed setting
).
This is for the purpose of quality control.
You will be shown short dialogues between two friends/colleagues X and Y. X and Y are in a certain context. In all the dialogues, X asks a simple ‘Yes/No’ question, and Y answers the question indirectly with a short sentence or phrase. For example, given the conversation:
Context: X wants to know about Y’s food preferences.
Question (X): “Do you eat red meat?”
Answer (Y): “I am a vegetarian.”
We can interpret and explain Y's answer:
Interpretation: No
Explanation: Vegetarians don't eat meat.
For this task we will ask you to help us interpret the indirect answers given by Y.
Read the dialog and tell us how you think X will interpret Y's answer. Your options are:
X thinks that Y means:
Yes
Yes, subject to some conditions
No
In the middle, neither yes nor no
Other
Examples:
CONTEXT: X wants to know about Y's movie preferences.
X: Do you like movies with sad endings?
Y: I often watch them.
-> X will think that Y means (1) Yes
CONTEXT: X wants to know about Y's movie preferences.
X: Are you up for a movie?
Y: Only to a comedy.
-> X will think that Y means (2) Yes, subject to some conditions
CONTEXT: Y has just told X that he/she is considering switching his/her job.
X: Will you have along commute?
Y: I'll be living very close to the job.
-> X will think that Y means (3) No
CONTEXT: Y has just told X that he/she is considering switching his/her job.
X: Are you excited to start a new job?
Y: I have mixed feelings.
-> X will think that Y means (4) In the middle, neither yes nor no
We then ask the annotators to rate the quality of explanations on a scale of 1 to 5, focusing on the quality of the explanations and how well they support the given interpretation.
Annotators are shown the original Circa context and conversation, plus an interpretation and explanation generated by our model. In the case where our model makes the wrong prediction, the interpretation is either the gold standard label or the model prediction, but we do not disclose the correctness of this interpretation to the annotator so that they focus on rating the quality of the explanation and whether the explanation supports the given interpretation.
This setup also allows us to study both the faithfulness (whether the explanation supports the prediction) and plausibility (whether the explanation supports the gold standard label) of our model.
You will be shown short dialogues between two friends/colleagues X and Y. X and Y are in a certain context. In all the dialogues, X asks a simple ‘Yes/No’ question, and Y answers the question indirectly with a short sentence or phrase. For example:
Context: X wants to know about Y’s food preferences.
Question (X): “Do you eat red meat?”
Answer (Y): “I am a vegetarian.”
An interpretation to Y's answer is given, and it is from one of these categories:
Yes
Yes, subject to some conditions
No
In the middle, neither yes nor no
Other
For this task we provide explanations for the interpretations, and we need your help to rate the quality of the explanations on a scale of 1 - 5. Below is an example of a good explanation that supports the interpretation:
Interpretation: No
Explanation: Vegetarians don't eat meat.
Here are some important criteria to keep in mind:
1 is the worst, which means the explanation either contradicts the answer choice or is meaningless. 5 is the best, which means the explanation explains the answer choice very well with meaningful content.
Explanations in following cases should be rated low:
Contradict the answer choice, or support a different answer choice;
Meaningless or irrelevant, e.g., "this is the only/best choice";
Only repeat the question;
Only repeat the answer choice without any other content;
Internally contradictory, e.g., "choice A is right because choice B is right".
Here are some examples for a GOOD explanation
CONTEXT: Y has just told X that he/she is thinking of buying a flat in New York.
X: Is a basement flat okay?
Y: I need to be above ground.
INTERPRETATION: No
EXPLANATION: A basement is not above ground.
CONTEXT: X wants to know about Y's food preferences.
X: Are you allergic to seafood?
Y: Avoiding seafood is best for my health.
INTERPRETATION: In the middle, neither yes nor no
EXPLANATION: Avoiding seafood does not imply being allergic to seafood.
Here are some examples for a BAD explanation
Example explanation that is non-nonsensical:
CONTEXT: X and Y are colleagues who are leaving work on a Friday at the same time.
X: Got any plans?
Y: My schedule is open.
INTERPRETATION: No
EXPLANATION: If a schedule is open, then it follows that plans got plans.
Example explanation that lacks common sense:
CONTEXT: X wants to know what activities Y likes to do during weekends.
X: Are you a fan of bars?
Y: I'm in AA.
INTERPRETATION: Yes
EXPLANATION: AA is a bar.
CONTEXT: Y has just told X that he/she is considering switching his/her job.
X: Are you in a support position?
Y: I'm a supervisor.
INTERPRETATION: Yes
EXPLANATION: A supervisor is a support position.
Example explanation that only repeats the question:
CONTEXT: X wants to know what activities Y likes to do during weekends.
X: Is disk golf fun to you?
Y: I've never done it.
INTERPRETATION: No
EXPLANATION: Disk golf is not fun.
CONTEXT: Y has just moved into a neighbourhood and meets his/her new neighbour X.
X: Are you extroverted?
Y: I don't know
INTERPRETATION: No
EXPLANATION: I don't know if I am extroverted.
CONTEXT: X and Y are childhood neighbours who unexpectedly run into each other at a cafe.
X: Do you still talk to our friend Katie?
Y: Remind me who she is
INTERPRETATION: No
EXPLANATION: Katie is a friend.
Here are some examples for a GOOD explanation
CONTEXT: Y has just told X that he/she is thinking of buying a flat in New York.
X: Is a basement flat okay?
Y: I need to be above ground.
INTERPRETATION: No
EXPLANATION: A basement is not above ground.
CONTEXT: X wants to know about Y's food preferences.
X: Are you allergic to seafood?
Y: Avoiding seafood is best for my health.
INTERPRETATION: In the middle, neither yes nor no
EXPLANATION: Avoiding seafood does not imply being allergic to seafood.
Here are some examples for a BAD explanation
Example explanation that is non-nonsensical:
CONTEXT: X and Y are colleagues who are leaving work on a Friday at the same time.
X: Got any plans?
Y: My schedule is open.
INTERPRETATION: No
EXPLANATION: If a schedule is open, then it follows that plans got plans.
Example explanation that lacks common sense:
CONTEXT: X wants to know what activities Y likes to do during weekends.
X: Are you a fan of bars?
Y: I'm in AA.
INTERPRETATION: Yes
EXPLANATION: AA is a bar.
CONTEXT: Y has just told X that he/she is considering switching his/her job.
X: Are you in a support position?
Y: I'm a supervisor.
INTERPRETATION: Yes
EXPLANATION: A supervisor is a support position.
Example explanation that only repeats the question:
CONTEXT: X wants to know what activities Y likes to do during weekends.
X: Is disk golf fun to you?
Y: I've never done it.
INTERPRETATION: No
EXPLANATION: Disk golf is not fun.
CONTEXT: Y has just moved into a neighbourhood and meets his/her new neighbour X.
X: Are you extroverted?
Y: I don't know
INTERPRETATION: No
EXPLANATION: I don't know if I am extroverted.
CONTEXT: X and Y are childhood neighbours who unexpectedly run into each other at a cafe.
X: Do you still talk to our friend Katie?
Y: Remind me who she is
INTERPRETATION: No
EXPLANATION: Katie is a friend.
The conversations used as questions in the first part of the survey or as examples for survey instructions are excluded from our main survey.
This is in .scripts/generate_mturk_questions.py
.
We generated two batches in total.
- For the first batch we uniformly sampled 15 examples from each of the four categories
(leaked/nonleaked) x (correct/incorrect prediction)
. One of these questions had a formatting error, so we discarded that.python generate_mturk_questions.py --input_csv ~/Downloads/LAS_nli_relaxed_unmatched13_test_data_circa_NLI_test.csv --num_sample 15 --output_csv mturk_explain_unmatched.csv
- For the second batch we randomly sampled 41 examples from the remaining part of the test set.
python generate_mturk_questions.py --input_csv ~/Downloads/LAS_nli_relaxed_unmatched13_test_data_circa_NLI_test.csv --num_samples 41 --output_csv mturk_explain_unmatched2.csv --exclude_samples mturk_explain_unmatched.csv
- Go to the worker sandbox page and sign in with your Amazon account.
- Finish registration process and create a worker account.
- Now you should be on the HITs page. Search for "Frederik Nolte", and you should see 2 HIT groups.
- Find the first survey titled "01 - Interpret Indirect Answers", click "Accept & Work" (on the right).
- Then complete the second one titled "02 - Explain Indirect Answers".
- There are multiple questions in this survey. To make it smoother, check the "Auto-accept next HIT" box on the top left before you submit the first answer.
To incentivize annotators, we provided Amazon coupons (25, 15, 10 eur) for those that annotated the most examples.