Skip to content
forked from thunlp/CANE

I adapt the CANE (Context-Aware Network Embedding) model for compatibility with modern TensorFlow versions. In parallel, I investigate keyword extraction techniques as a means to significantly reduce computational costs while preserving performance comparable to the original approach.

License

Notifications You must be signed in to change notification settings

GeorgeM2000/CANE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CANE

Source code and datasets of ACL2017 paper: "CANE: Context-Aware Network Embedding for Relation Modeling"

Datasets

This folder "datasets" contains three datasets used in CANE, including Cora, HepTh and Zhihu. In each dataset, there are two files named "data.txt" and "graph.txt".

  • data.txt: Each line represents the text information of a vertex.
  • graph.txt: The edgelist file of current social network.

Besides, there is an additional "group.txt" file in Cora.

  • group.txt: Each vertex in Cora has been annotated with a label. This file can be used for vertex classification.

Run

Run the following command for training CANE:

python3 run.py --dataset [cora,HepTh,zhihu] --gpu gpu_id --ratio [0.15,0.25,...] --rho rho_value

For example, you can train like:

python3 run.py --dataset zhihu --gpu 0 --ratio 0.55 --rho 1.0,0.3,0.3

Experimental Results

The experimental results are generated by the newest version of codes:

0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
cora 85.2 90.5 92.2 93.5 93.4 93.6 94.4 95 92.5
HepTh 85 89.7 91.7 95 94.4 94.2 95.1 95.8 93.1
zhihu 64.5 67.1 69.2 69.9 72 72.2 72.5 72.8 73.3

Computation time for keyword/keyphrase extraction methods given the entire graph:

T: 5 Time (sec)
TFIDF 2.622457
YAKE 24.542249
PRank 41.415699
TxtRank 40.290922
ToRank 34.672437
T: 10 Time (sec)
TFIDF 2.669563
YAKE 62.481739
PRank 37.410711
TxtRank 35.84592
ToRank 32.635109

The experimental results for the cora citation graph with keyword/keyphrase extraction techniques:

T: 5 0.15 0.45 0.75
Abst 84.6 92.9 94.1
TFIDF 64.6 77.2 88.2
YAKE 71.5 80.6 86.2
PRank 76.8 90 93.9
KBERT3 74.4 87.4 90.5
KBERT3G 77.3 88 93
T: 10 0.15 0.45 0.75
Abst 84.6 92.9 94.1
TFIDF 66.8 81.8 89.8
YAKE 70.2 86.6 91.7
PRank 85.3 92.2 94.6

Time in seconds:

T: 5 0.15 0.45 0.75
Abst 42.1 113.4 190
TFIDF 23.4 + 0.852434 36.4 + 1.83892 59.6 + 2.249102
YAKE 13.3 + 10.146231 35.6 + 18.81623 59.6 + 21.892501
PRank 17 + 16.420615 37.7 + 29.095505 63.6 + 33.532091
KBERT3 13.4 34.5 59.4
KBERT3G 12.7 35.6 57
T: 10 0.15 0.45 0.75
Abst 42.1 113.4 190
TFIDF 20.8 + 0.833934 34.8 + 1.96583 59.8 + 2.243566
YAKE 13.5 + 25.919926 38 + 48.326221 59.7 + 56.408093
PRank 17 + 15.118391 47.4 + 28.553312 80.2 + 33.022757

Dependencies

  • Tensorflow == 1.11.0
  • Scipy == 1.1.0
  • Numpy == 1.16.2

Cite

If you use the code, please cite this paper:

Cunchao Tu, Han Liu, Zhiyuan Liu, Maosong Sun. CANE: Context-Aware Network Embedding for Relation Modeling. The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017).

For more related works on network representation learning, please refer to my homepage.

About

I adapt the CANE (Context-Aware Network Embedding) model for compatibility with modern TensorFlow versions. In parallel, I investigate keyword extraction techniques as a means to significantly reduce computational costs while preserving performance comparable to the original approach.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 77.7%
  • Python 22.3%