embedding_downsampling_comparison

Experiments on different word embedding algorithms and how down-sampling of frequent words or nearby words affects them, especially in regards to reliability.

Probabilistic down-sampling seems to be worse than weighting in most scenarios.

Inlcudes a modified word2vec(w) which uses weighting, not beneficial here. Assumes conda for dependency managament and my modified version of hyperwords.

Note that slurm was used for the long running Wikipedia experiments, yet not for others

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
results		results
statistical_tests		statistical_tests
word2vecw		word2vecw
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze-results.py		analyze-results.py
bootstrap.py		bootstrap.py
bootstrap.sh		bootstrap.sh
coha_converter.py		coha_converter.py
environment.yml		environment.yml
evaluate-coha.sh		evaluate-coha.sh
evaluate-news.sh		evaluate-news.sh
evaluate-variant.sh		evaluate-variant.sh
evaluate-wiki.sh		evaluate-wiki.sh
friedman-test.py		friedman-test.py
most_frequent_words.py		most_frequent_words.py
newdata_repeval.txt		newdata_repeval.txt
prepare-coha.sh		prepare-coha.sh
prepare-most-frequent-lists.sh		prepare-most-frequent-lists.sh
prepare-news.sh		prepare-news.sh
prepare-wiki.sh		prepare-wiki.sh
significance_test_data		significance_test_data
significance_tests.py		significance_tests.py
start-eval.sh		start-eval.sh
start-training.sh		start-training.sh
train-glove.sh		train-glove.sh
train-ppmi-bootstrap.sh		train-ppmi-bootstrap.sh
train-ppmi.sh		train-ppmi.sh
train-sgns-bootstrap.sh		train-sgns-bootstrap.sh
train-sgns.sh		train-sgns.sh
train-slurm.sh		train-slurm.sh
train-wiki.sh		train-wiki.sh
xml2txt.pl		xml2txt.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

embedding_downsampling_comparison

About

Uh oh!

Releases

Packages

Languages

License

hellrich/embedding_downsampling_comparison

Folders and files

Latest commit

History

Repository files navigation

embedding_downsampling_comparison

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages