You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Approach ( idea inspired from topic modelling on user prompts from Chatbot Arena paper
To study the prompt diversity, we build a topic modeling pipeline with BERTopic3 (Grootendorst, 2022). We start with transforming user prompts into representation vectors using OpenAI’s text embedding model (text-embedding-3-small). To mitigate the curse of dimensionality for data clustering, we employ UMAP (Uniform Manifold Approximation and Projection) (McInnes et al., 2020) to reduce the embedding dimension from 1,536 to 5. We then use the hierarchical density-based clustering algorithm, HDBSCAN, to identify topic clusters with minimum cluster size 32. Finally, to obtain topic labels, we sample 10 prompts from each topic cluster and feed into GPT-4-Turbo for topic summarization.
The text was updated successfully, but these errors were encountered:
Treat it as unsupervised problem.
Approach ( idea inspired from topic modelling on user prompts from Chatbot Arena paper
The text was updated successfully, but these errors were encountered: