Farcaster Topic Labeling

How it works:

BERTopic efficiently analyzes social network posts to extract meaningful topics using a multi-step process:

Embedding: Convert posts into numerical representations using sentence-transformers models.
Dimensionality Reduction: Reduce data dimensionality with techniques like UMAP.
Clustering: Group similar posts using HDBSCAN, a density-based clustering method.
Bag-of-Words: Generate bag-of-words representations for each cluster.
Topic Representation: Modify TF-IDF to highlight cluster-specific words, forming topic descriptions.

More info: https://maartengr.github.io/BERTopic/index.html

The generated labels and groups can be visualized in many ways!

Setup

1. Install requirements

pip install -r requirements.txt

2. Download NLTK Stopwords

In a python interpreter:

import nltk
nltk.download("stopwords")

3. Paste the parquet files

Put them in ./raw_data/

Run

python src/main.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
raw_data		raw_data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Farcaster Topic Labeling

How it works:

Setup

1. Install requirements

2. Download NLTK Stopwords

3. Paste the parquet files

Run

About

Releases

Packages

Contributors 2

Languages

vigneshka/farcaster-embeddings

Folders and files

Latest commit

History

Repository files navigation

Farcaster Topic Labeling

How it works:

Setup

1. Install requirements

2. Download NLTK Stopwords

3. Paste the parquet files

Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages