CogDL now supports the following datasets for different tasks:
Network Embedding (Unsupervised node classification): PPI, Blogcatalog, Wikipedia, Youtube, DBLP, Flickr
Semi/Un-superviesd Node classification: Cora, Citeseer, Pubmed, Reddit, PPI, PPI-large, Yelp, Flickr, Amazon
Heterogeneous node classification: DBLP, ACM, IMDB
Link prediction: PPI, Wikipedia, Blogcatalog
Multiplex link prediction: Amazon, YouTube, Twitter
graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB, NCI, NCI109, Reddit-BINARY
Dataset
#Nodes
#Edges
#Features
#Classes
#Train/Val/Test
Degree
#Name in Cogdl
Transductive
Cora
2,708
5,429
1,433
7(s)
140 / 500 / 1000
2
cora
Citeseer
3,327
4,732
3,703
6(s)
120 / 500 / 1000
1
citeseer
PubMed
19,717
44,338
500
3(s)
60 / 500 / 1999
2
pubmed
Chameleon
2,277
36,101
2,325
5
0.48 / 0.32 / 0.20
16
chameleon
Cornell
183
298
1,703
5
0.48 / 0.32 / 0.20
1.6
cornell
Film
7,600
30,019
932
5
0.48 / 0.32 / 0.20
4
film
Squirrel
5201
217,073
2,089
5
0.48 / 0.32 / 0.20
41.7
squirrel
Texas
182
325
1,703
5
0.48 / 0.32 / 0.20
1.8
texas
Wisconsin
251
515
1,703
5
0.48 / 0.32 / 0.20
2
Wisconsin
Inductive
PPI
14,755
225,270
50
121(m)
0.66 / 0.12 / 0.22
15
ppi
PPI-large
56,944
818,736
50
121(m)
0.79 / 0.11 / 0.10
14
ppi-large
Reddit
232,965
11,606,919
602
41(s)
0.66 / 0.10 / 0.24
50
reddit
Flickr
89,250
899,756
500
7(s)
0.50 / 0.25 / 0.25
10
flickr
Yelp
716,847
6,977,410
300
100(m)
0.75 / 0.10 / 0.15
10
yelp
Amazon-SAINT
1,598,960
132,169,734
200
107(m)
0.85 / 0.05 / 0.10
83
amazon-s
Network Embedding(Unsupervised Node classification)
Dataset
#Nodes
#Edges
#Classes
#Degree
#Name in Cogdl
PPI
3,890
76,584
50(m)
20
ppi-ne
BlogCatalog
10,312
333,983
40(m)
32
blogcatalog
Wikipedia
4.777
184,812
39(m)
39
wikipedia
Flickr
80,513
5,899,882
195(m)
73
flickr-ne
DBLP
51,264
2,990,443
60(m)
2
dblp-ne
Youtube
1,138,499
2,990,443
47(m)
3
youtube-ne
Dataset
#Nodes
#Edges
#Features
#Classes
#Train/Val/Test
#Degree
#Edge Type
#Name in Cogdl
DBLP
18,405
67,946
334
4
800 / 400 / 2857
4
4
gtn-dblp(han-acm)
ACM
8,994
25,922
1,902
3
600 / 300 / 2125
3
4
gtn-acm(han-acm)
IMDB
12,772
37,288
1,256
3
300 / 300 / 2339
3
4
gtn-imdb(han-imdb)
Amazon-GATNE
10,166
148,863
-
-
-
15
2
amazon
Youtube-GATNE
2,000
1,310,617
-
-
-
655
5
youtube
Twitter
10,000
331,899
-
-
-
33
4
twitter
Knowledge Graph Link Prediction
Dataset
#Nodes
#Edges
#Train/Val/Test
#Relations Types
#Degree
#Name in Cogdl
FB13
75,043
345,872
316,232 / 5,908 / 23,733
12
5
fb13
FB15k
14,951
592,213
483,142 / 50,000 / 59,071
1345
40
fb15k
FB15k-237
14,541
310,116
272,115 / 17,535 / 20,466
237
21
fb15k237
WN18
40,943
151,442
141,442 / 5,000 / 5,000
18
4
wn18
WN18RR
86,835
93,003
86,835 / 3,034 / 3,134
11
1
wn18rr
TUdataset from https://www.chrsmrrs.com/graphkerneldatasets
Dataset
#Graphs
#Classes
#Avg. Size
#Name in Cogdl
MUTAG
188
2
17.9
mutag
IMDB-B
1,000
2
19.8
imdb-b
IMDB-M
1,500
3
13
imdb-m
PROTEINS
1,113
2
39.1
proteins
COLLAB
5,000
5
508.5
collab
NCI1
4,110
2
29.8
nci1
NCI109
4,127
2
39.7
nci109
PTC-MR
344
2
14.3
ptc-mr
REDDIT-BINARY
2,000
2
429.7
reddit-b
REDDIT-MULTI-5k
4,999
5
508.5
reddit-multi-5k
REDDIT-MULTI-12k
11,929
11
391.5
reddit-multi-12k
BBBP
2,039
2
24
bbbp
BACE
1,513
2
34.1
bace