Skip to content

Global word frequency calculation #121

@ClaudiaShu

Description

@ClaudiaShu

Hi, I have a question about computing the replacement S score.

In your paper, the score is obtained by $S(w) = freq(w)IDF(w)$. However, in the code, this score is calculated by adding the TF-IDF score of a term in every document as below. However, $freq(w)$ in the corpus is not the sum of word frequency in a document. Moreover, the idf score of a term in the corpus should always be the same since the number of documents that contains term $w$ and the number of documents are always the same.

# Compute TF-IDF
tf_idf = {}
for i in range(len(examples)):
  cur_word_dict = {}
  cur_sent = copy.deepcopy(examples[i].word_list_a)
  if examples[i].text_b:
    cur_sent += examples[i].word_list_b
  for word in cur_sent:
    if word not in tf_idf:
      tf_idf[word] = 0
    tf_idf[word] += 1. / len(cur_sent) * idf[word]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions