- Convert big wordEmbedding file based on the vocab that create by dataset
- python 3
- pytorch > 0.1
- torchtext > 0.1
- numpy
-
the file of hyperparams.py contains all hyperparams that need to modify, based on yours nedds, select neural networks what you want and config the hyperparams.
-
the file of main-hyperparams.py is the main function,run the command ("python main_hyperparams.py") to execute the demo.
-
the folder of loaddata contains some file of load dataset
-
the folder of word2vec is the file of word embedding that you want to use
-
the folder of data contains the dataset file,contains train data,dev data,test data.
-
the file of handle_wordEmbedding2File.py to hadle the big word2vec
- the word embedding file saved in the folder of word2vec, but now is empty, because of it is to big,so if you want to use word embedding,you can to download word2vec or glove file, then saved in the folder of word2vec,and make the option of word_Embedding to True and modifiy the value of word_Embedding_Path in the hyperparams.py file.
-
datafile_path: the path of dataset
-
need_convert_path: the so big wordEmbedding that want to change based one the dateset
-
save_converted_word_Embedding_Path:after convert where to save
-
need_convert_dim:the dim of big wordEmbedding
-
num_threads:set the value of threads when run the demo
-
seed_num:set the num of random seed
- Only for myself to load wordEmbedding quickly compare to the big file like Glogle News.