DisCoFuzz: A Fuzzy Wasserstein-Fourier Space for Distributional-Compositional Text Embeddings

While great success has been achieved using transformer architectures in this area, the intermediate representations of neural approaches still lack mechanistic interpretability. Recent developments in alternative compositional spaces, however, offer significant promise. By mapping a set of pre-generated text embeddings to a Wasserstein-Fourier space similar to that proposed by Cazelles et. al. (Cazelles et. al. 2020), Zadeh's fuzzy logic framework (Zadeh, 1965) can be used for computationally efficient, scalable, and interpretable embedding composition. The resulting distributions can then be evaluated using Earthmover's Distances (EMDs) between their power spectra, in a manner similar to the Word Mover's Distance (Kusner et. al. 2015).

By synthesizing recent developments in optimal transport-based similarity metrics with a fuzzy logic grounding framework, I propose a deterministic map to a unique Distributional-Compositional (DisCo) text embedding space. More specifically, PCA-reduced MiniLM lemma embeddings are mapped to Fourier expansions of 2\pi-periodic Gaussian kernels in L^2. Baselines consisting of fuzzified mean lemma embeddings, MiniLM sentence embeddings, and several toy composition models are evaluated on a subset of the WiC dataset (Pilehvar and Camacho-Collados, 2019). Relative to these baselines, we demonstrate the effectiveness of various metrics compared to those defined in the original MiniLM embedding space.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
eval		eval
src/discofuzz		src/discofuzz
tests		tests
visualizations		visualizations
.gitignore		.gitignore
DisCoFuzz_WiC_Eval.ipynb		DisCoFuzz_WiC_Eval.ipynb
LICENSE		LICENSE
README.md		README.md
References.bib		References.bib
Visualizations.ipynb		Visualizations.ipynb
WiC_EDA.ipynb		WiC_EDA.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DisCoFuzz: A Fuzzy Wasserstein-Fourier Space for Distributional-Compositional Text Embeddings

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

IsaacFigNewton/DisCoFuzz

Folders and files

Latest commit

History

Repository files navigation

DisCoFuzz: A Fuzzy Wasserstein-Fourier Space for Distributional-Compositional Text Embeddings

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages