Alternative to Static Tagging Text Classification #71

manisnesan · 2024-03-10T14:56:29Z

Treat it as unsupervised problem.

Approach ( idea inspired from topic modelling on user prompts from Chatbot Arena paper

To study the prompt diversity, we build a topic modeling pipeline with BERTopic3 (Grootendorst, 2022). We start with transforming user prompts into representation vectors using OpenAI’s text embedding model (text-embedding-3-small). To mitigate the curse of dimensionality for data clustering, we employ UMAP (Uniform Manifold Approximation and Projection) (McInnes et al., 2020) to reduce the embedding dimension from 1,536 to 5. We then use the hierarchical density-based clustering algorithm, HDBSCAN, to identify topic clusters with minimum cluster size 32. Finally, to obtain topic labels, we sample 10 prompts from each topic cluster and feed into GPT-4-Turbo for topic summarization.

manisnesan · 2024-03-11T04:19:19Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative to Static Tagging Text Classification #71

Alternative to Static Tagging Text Classification #71

manisnesan commented Mar 10, 2024

manisnesan commented Mar 11, 2024

Alternative to Static Tagging Text Classification #71

Alternative to Static Tagging Text Classification #71

Comments

manisnesan commented Mar 10, 2024

manisnesan commented Mar 11, 2024