You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do we need this line to check corpus_id != query_id
for a query with id_q, the corpus with the same id id_q does not mean it is the positive corpus for it. So why do we need to avoid corpus_id == query_id
for query_itr in range(len(query_embeddings)):
query_id = query_ids[query_itr]
for sub_corpus_id, score in zip(cos_scores_top_k_idx[query_itr], cos_scores_top_k_values[query_itr]):
corpus_id = corpus_ids[corpus_start_idx+sub_corpus_id]
if corpus_id != query_id:
if len(result_heaps[query_id]) < top_k:
# Push item on the heap
heapq.heappush(result_heaps[query_id], (score, corpus_id))
else:
# If item is larger than the smallest in the heap, push it on the heap then pop the smallest element
heapq.heappushpop(result_heaps[query_id], (score, corpus_id))
for qid in result_heaps:
for score, corpus_id in result_heaps[qid]:
self.results[qid][corpus_id] = score
return self.results
The text was updated successfully, but these errors were encountered:
We require this line for two datasets: ArguAna and Quora, where corpus_ids and query_ids are similar, i.e., the query is also present within the corpus.
The line is used to avoid the edge case of self-retrieval where the query is self-retrieved at the top-1 position, which reduces the nDCG@10 score for ArguAna and Quora.
Why do we need this line to check corpus_id != query_id
for a query with id_q, the corpus with the same id id_q does not mean it is the positive corpus for it. So why do we need to avoid corpus_id == query_id
The text was updated successfully, but these errors were encountered: