Skip to content

Conversation

@sundy1994
Copy link
Collaborator

fix #363

@sundy1994 sundy1994 requested a review from xehu December 17, 2025 19:34
Copy link
Collaborator

@xehu xehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See error in the logic flow in the inline comments! I think we might need to make a small adjustment.

counted_sentences.add(sent.start)
break
return yesno_count, wh_count
# sentences = [str(sent) for sent in doc.sents if '?' in str(sent)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the old logic commented out, which we can remove once we establish the new logic works!

" thank ",
" thanks ",
" thank you ",
#" thank you ",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that we're commenting this out because otherwise we'd double count "thank you very much" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Both "thank" and "thank you" will be counted. We discussed this issue with Burint and he made the changes.

'is', 'are', 'was', 'were', 'am'}
# Pronouns that often follow auxiliaries in Yes/No questions
pronoun_followers = {'i', 'you', 'we', 'he', 'she', 'they', 'it'}
# filler_words = {'ok', 'so', 'well', 'like', 'you know', 'i mean', 'actually', 'basically', 'right', 'just', 'uh', 'um', 'oh', 'hmm', 'like'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we no longer ignoring filler words like 'ok so' ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This was my own solution to this test case: "so Which part should we do first?". Burint updated his code later, so I adapted that instead.

sent_tokens = list(sent)
if not sent_tokens:
continue
# Method 1: Find question sentences by checking for '?' at end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there's a bug with this logic, which is that, if the sentence doesn't end with a question mark, the fallback method is very error prone. For example:

Question(nlp("can you tell me what is your name?")) yields YesNo = 1, WH = 0 (correct)
Question(nlp("can you tell me what is your name")) yields YesNo = 0, WH = 1 (incorrect)

Why can't we use logic that looks for the question words, but then applies the logic with the search_tags, which helps to get around some of these problems?

@xehu
Copy link
Collaborator

xehu commented Jan 10, 2026

politeness_test.ipynb

I wrote a much more robust set of tests as I tried to iterate and figure out edge cases to the question function. Basically, I ended up just trying to come up with test cases, fixing them, and eventually I ended up reworking a good chunk of the function in order to catch all of the errors.

I think this is probably the best version of the question detector we have so far, and perhaps we can (1) update the test cases with these expected values and (2) make a PR or comment to the other repository with a note on the suggested fixes. Ultimately, I think it's better that we have a more accurate version of the question extraction, than to follow the source material if it is error-prone.

Copy link
Collaborator

@xehu xehu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our version of the politeness questions is more robust than ever, and we also have updated tests to confirm that we get the edge cases right. I say we merge this in!

wh_words = {'what', 'who', 'where', 'when', 'why', 'how', 'which'}
wh_followers = {
'what': {'are', 'is', 'do', 'does', 'can', 'should', 'might'},
'who': {'am', 'is', 'are', 'was', 'can', 'should'},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy1994 I found a bug when testing this. It turns out that if you do not add 'am' to WH_followers, things like who am I or what am I supposed to do are not detected as WH_questions. But there are still some bugs with this... I'll follow up on Slack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE - I realized that it actually doesn't make much sense to separate the wh_followers from the other auxiliaries, so I refactored this so that we only use a consistent set of auxiliaries

1,B,I am not sure. Where is the rest of our team?,WH_Questions_receptiveness_yeomans,1
1,B,I am not sure. Where is the rest of our team?,First_Person_Single_receptiveness_yeomans,1
1,B,"Well please help me figure this out, I really want to do well on this please okay",factuality_politeness_convokit,1
1B,A,is where different from why?,YesNo_Questions_receptiveness_yeomans,1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy1994 I added in this more robust set of question tests to test all the edge cases that the code (hopefully) now catches!

@xehu xehu merged commit 4b2b0ed into dev Jan 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants