-
Notifications
You must be signed in to change notification settings - Fork 0
Experiments
Aleksandr Perevalov edited this page Aug 25, 2020
·
8 revisions
Model: bert-base-cased
Test set: dbpedia_df.sample(4400, random_state=42)
Evaluation results:
-------------------
Category prediction (based on 4380 questions)
Accuracy: 0.969
Type ranking (based on 3573 questions)
NDCG@5: 0.533
NDCG@10: 0.499
!!! Wrong type order
Model: bert-base-multilingual-cased
Test set: dbpedia_df.sample(4400, random_state=42)
Evaluation results:
-------------------
Category prediction (based on 4380 questions)
Accuracy: 0.969
Type ranking (based on 3577 questions)
NDCG@5: 0.674
NDCG@10: 0.632
Model: bert-base-multilingual-cased
Test set: dbpedia_df.sample(4400, random_state=42)
Evaluation results:
-------------------
Category prediction (based on 4380 questions)
Accuracy: 0.968
Type ranking (based on 3576 questions)
NDCG@5: 0.704
NDCG@10: 0.661
Model: bert-base-multilingual-cased
Test set: dbpedia_df.sample(4400, random_state=42)
Prediction mode: make predictions for each language, then choose one using majority vote
Evaluation results:
-------------------
Category prediction (based on 4380 questions)
Accuracy: 0.962
Type ranking (based on 3547 questions)
NDCG@5: 0.708
NDCG@10: 0.665
Model: bert-base-cased
Test set: dbpedia_df.sample(4400, random_state=42)
Evaluation results:
-------------------
Category prediction (based on 4380 questions)
Accuracy: 0.357
Type ranking (based on 1045 questions)
NDCG@5: 0.165
NDCG@10: 0.140
Model: bert-base-cased
Test set: dbpedia_df.sample(4400, random_state=42)
(annotated as test)
Evaluation results:
-------------------
Category prediction (based on 1902 questions)
Accuracy: 0.964
Type ranking (based on 1482 questions)
NDCG@5: 0.749
NDCG@10: 0.714
Model: bert-base-cased
Test set: (original)
dbpedia_df = pd.read_csv("/kaggle/input/test-06366/dbpedia-annotated.csv".format(dbpedia_path), sep="|")
dbpedia_df['original_id'] = dbpedia_df.id.apply(lambda x: x.split('-')[0])
dbpedia_faketest_df = dbpedia_df.sample(4400, random_state=42)
dbpedia_orig_df = pd.read_csv("/kaggle/input/test-06366/dbpedia.csv".format(dbpedia_path), sep="|")
dbpedia_faketest_df = dbpedia_orig_df[dbpedia_orig_df.id.isin(dbpedia_faketest_df.original_id.values)]
Evaluation results:
-------------------
Category prediction (based on 1902 questions)
Accuracy: 0.973
Type ranking (based on 1499 questions)
NDCG@5: 0.740
NDCG@10: 0.706