Skip to content

Experiments on word embedding reliability with different sampling scenarios. Includes modified version of word2vec.

License

Notifications You must be signed in to change notification settings

hellrich/embedding_downsampling_comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

embedding_downsampling_comparison

Experiments on different word embedding algorithms and how down-sampling of frequent words or nearby words affects them, especially in regards to reliability.

Probabilistic down-sampling seems to be worse than weighting in most scenarios.

Inlcudes a modified word2vec(w) which uses weighting, not beneficial here. Assumes conda for dependency managament and my modified version of hyperwords.

Note that slurm was used for the long running Wikipedia experiments, yet not for others

About

Experiments on word embedding reliability with different sampling scenarios. Includes modified version of word2vec.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published