Skip to content

Commit

Permalink
add zpop
Browse files Browse the repository at this point in the history
  • Loading branch information
yurenpang committed Jun 24, 2020
1 parent ceea8f0 commit 464ee13
Show file tree
Hide file tree
Showing 147 changed files with 485 additions and 3 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,3 @@ venv.bak/

# Ignore experiment folder
experiments/
data/
11 changes: 9 additions & 2 deletions bin/run-joint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ do
exp_dir=$(prepare_experiment_dir $topic ${exp_id})
# Write down parameters that'll be on the html file
write_experiment_params ${exp_dir} weight 1

python -m cartograph.draw.zpop_creator \
--experiment ${exp_dir} \
--popularity_score data/${topic}/popularity_score.csv
# Step 2: run UMAP
python -m cartograph.xy_embed.umap_embed \
--map_directory ${exp_dir} \
Expand Down Expand Up @@ -47,13 +49,18 @@ do
--cluster_groups /key_phrases_cluster_groups.csv \
--output_file /key_phrases_top_labels.csv \
--label_source key_phrases \
--num_top_labels ${labels_num} # number of top keyphrases labels
--num_top_labels ${labels_num} # number of top keyphrases labels

#Step 5: Fetch hierarchical categories from key phrases
python -m cartograph.h_cat_fetcher \
--experiment ${exp_dir} \
--isSumInKeyPhrase ${isSumInKeyPhrase}

#Step 5.5 generate zpop
python -m cartograph.draw.zpop_creator \
--experiment ${exp_dir} \
--popularity_score data/${topic}/popularity_score.csv

# Step 6
python -m cartograph.user_study_label \
--experiment ${exp_dir} \
Expand Down
47 changes: 47 additions & 0 deletions cartograph/draw/zpop_creator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Transform popularity score from wikipedia to a score that looks good on cartograph
# Author: Rock Pang,
# Reference: https://github.com/shilad/cartograph/blob/develop-simple/cartograph/CalculateZPop.py

import pandas as pd
import numpy as np
import argparse


def log4(x):
return np.log2(x) / np.log2(4)


def main(experiment_directory, popularity_score_df, new_xy_embeddings_df):
assert(popularity_score_df.shape[0] != 0) # check if popularity score df is not empty
new_rows = []
pop_dic = {}
for row in popularity_score_df.itertuples():
pop_dic[row.article_id] = row.popularity_score

for row in new_xy_embeddings_df.itertuples():
val = 0
if row.article_id in pop_dic:
val = pop_dic[row.article_id]
new_rows.append({"article_id":row.article_id, "popularity_score": val})

df = pd.DataFrame(new_rows)

sorted_score = df.sort_values(by='popularity_score', ascending=False)

sorted_score['zpop'] = log4(np.arange(sorted_score.shape[0]) / 2.0 + 1.0)
sorted_score = sorted_score.drop("popularity_score", axis=1)
sorted_score.to_csv(experiment_directory + "/zpop_score.csv", index=False)


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--experiment', required=True)
parser.add_argument('--popularity_score', required=True)
args = parser.parse_args()

experiment_directory = args.experiment
popularity_score = args.popularity_score
new_xy_embedding = experiment_directory + "/new_xy_embeddings.csv"


main(experiment_directory, pd.read_csv(popularity_score), pd.read_csv(new_xy_embedding))
3 changes: 3 additions & 0 deletions data/article_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_hierarchical_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_keyphrases.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_keywords.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_labels_combined.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_lda_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_links.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_text.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_text_gloss.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_topic_distribution.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/article_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/combined_label_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/domain_concept.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/hierarchical_category_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/keyphrases_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/keyword_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/labels/LDA_labels/LDA_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/lda_label_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/link_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/popularity_score.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/topic_model
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/topic_model.expElogbeta.npy
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/topic_model.id2word
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/education/topic_model.state
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_hierarchical_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_keyphrases.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_keywords.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_labels_combined.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_lda_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_links.csv
Git LFS file not shown
Empty file added data/food/article_text.csv
Empty file.
3 changes: 3 additions & 0 deletions data/food/article_text_gloss.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/article_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/category_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/combined_label_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/domain_concept.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/hierarchical_category_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/keyphrases_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/keyword_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/lda_label_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/link_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/popularity_score.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/topic/article_topic_distribution.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/topic/topic_label_distribution.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/food/vanilla_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_hierarchical_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_keyphrases.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_keywords.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_lda_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_links.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_text
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_text_gloss.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_topic_distribution.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/article_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/domain_concept.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/hierarchical_category_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/keyphrases_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/keyword_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/labels/LDA_labels/LDA_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/lda_label_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/link_names.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/popularity_score.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/topic_model
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/topic_model.expElogbeta.npy
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/topic_model.id2word
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/georgraphy/topic_model.state
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/h_cat_from_top_labels.pkl
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/h_cat_from_top_labels_one_level.pkl
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_hierarchical_categories.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_keyphrases.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_keywords.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_labels_combined.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_lda_labels.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_links.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_text.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_text_summary_don_forget_delete.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/article_vectors.csv
Git LFS file not shown
3 changes: 3 additions & 0 deletions data/internet/category_names.csv
Git LFS file not shown
Loading

0 comments on commit 464ee13

Please sign in to comment.