Code for the paper "Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model" The data is available at https://pan.baidu.com/s/1b5zAe7qqUBmuHz6nTU95UA The extraction code is 6xdw
segged_content can be extracted from the json file, the original document content needs to be mapped to the comments with url
In the graph directory, there are the codes for building the topic interaction graph. The main entry for building the graph is within "my_feature_extractor.py", the input for which is a csv file generated by "write_csv.py". The columns of the csv file are as follow, url, gold comment, topic words extracted from the article, the original title of the article. The main work horse of the script "my_feature_extractor.py" is from "ccig.py". Note that the extraction method of topic word is not given in this repository, one can use their own methods to extract topic words from articles.
In the models directory, we give the code for our model in "graph2seq.py" and baseline models we adopt.
In "Data.py", we give the code to load the data. The class Vocabulary is to build the vocabulary according to the corpus. Each "Example" indicates one article and the corresponding title, comment, topic words and some other information. A "Batch" is a mini-batch of examples. In the "Dataloader", we load the data from the json file extracted by "my_feature_extractor.py" and build the final adjacency matrix of the topic interaction graph.
In "train.py", we give the main entrance for the program where one can train or do inference.