Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add issue comment TFIDF similarity metrics and issue comment Ja… #1242

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bifenglin
Copy link
Contributor

This PR adds the metric about actors' comment TFIDF similarity and actors' Jaccard similarity.

@xgdyp
Copy link
Contributor

xgdyp commented Mar 29, 2023

hi @bifenglin , if this pr has not completed, you can convert it to draft PR

…ccard similarity

Signed-off-by: bifenglin <515186469@qq.com>
@@ -15,6 +16,10 @@ def getRelatedUsers(config):
return related_users.getRelatedUsers(config)
def getAttention(config):
return attention.getAttention(config)
def getIssueCommentTFIDFSimilarity(config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that config has never been used

"""
TODO: get Selected Acotrs
"""
sql = 'SELECT DISTINCT(actor_id) FROM opensource.gh_events')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a limit clause such as limit 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function aims to find all users if have no selecting choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISTINCT selection from the whole table may lead to a OOM exception since the ClickHouse only has 128GB memory, we should definitely avoid this kind of SQL.

@xgdyp
Copy link
Contributor

xgdyp commented May 17, 2023

hi, I think this pr can be converted into a case in notebook. I'll rewrite it.

@l1tok
Copy link
Contributor

l1tok commented Oct 22, 2023

hi @bifenglin. I'm adding this to the notebook recently. I think there is something to be confirmed below :

  • Both the code and the pr description are about the similarity of actors, so the pr title should be changed accordingly.
  • getRecent100Comment in getIssueCommentTFIDFSimilarity is not defined .
  • In config, do we want to pass the actors we want to query into the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants