This repository demonstrates how to generate text embeddings using OpenAI models through LangChain, and then visualize the semantic similarity between words in 3D space using PCA and Matplotlib.
- Generate embeddings for arbitrary text using OpenAI’s embedding models.
- Support for multiple models:
text-embedding-3-large
text-embedding-3-small
text-embedding-ada-002
(default)
- Dimensionality reduction using PCA.
- Interactive 3D scatter plot visualization of embeddings.
Install dependencies with:
pip install -r requirements.txt
requirements.txt
langchain-openai
python-dotenv
numpy
matplotlib
scikit-learn
You will also need an OpenAI API key.
-
Clone this repository:
git clone https://github.com/your-username/embedding-visualizer.git cd embedding-visualizer
-
Create a
.env
file in the root directory and add your OpenAI API key:echo "OPENAI_API_KEY=your_api_key_here" > .env
-
Choose your embedding model by editing the
EMBEDDING_MODEL
variable in the script:EMBEDDING_MODEL="text-embedding-ada-002"
Run the script to generate embeddings and plot them:
python embeddings_plot.py
This will:
- Generate embeddings for the hardcoded list of words:
texts = ["nfl", "football", "soccer", "basketball", "baseball"]
- Reduce them to 3D space using PCA.
- Save the visualization to
3d_plot_small.png
.
Example output:
📊 A 3D scatter plot showing the relative similarity of sports terms.
.
├── embeddings_plot.py # Main script
├── requirements.txt # Dependencies
└── .env # API key (not committed)
- To change the words being compared, edit the
texts
list inembeddings_plot.py
. - To try a different embedding model, set
EMBEDDING_MODEL
accordingly. - To adjust plot resolution, modify the
dpi
parameter in:plt.savefig("3d_plot_small.png", dpi=1000, bbox_inches='tight')