trigaten
diff --git a/‎topicgpt/README.md
Lines changed: 13 additions & 0 deletions b/‎topicgpt/README.md
Lines changed: 13 additions & 0 deletions
diff --git a/‎topicgpt/data/generation_1_paper.jsonl
Lines changed: 1464 additions & 0 deletions b/‎topicgpt/data/generation_1_paper.jsonl
Lines changed: 1464 additions & 0 deletions
diff --git a/‎topicgpt/data/master_paper.md
Lines changed: 464 additions & 0 deletions b/‎topicgpt/data/master_paper.md
Lines changed: 464 additions & 0 deletions
diff --git a/‎topicgpt/data/master_paper_organized.md
Lines changed: 59 additions & 0 deletions b/‎topicgpt/data/master_paper_organized.md
Lines changed: 59 additions & 0 deletions
@@ -0,0 +1,13 @@
+## TopicGPT
+
+### Setup 
+- Set your API key in an environment variable called `OPENAI_API_KEY`, or directly in the script/utils.py file. 
+- Install the requirements: `pip install -r requirements.txt`
+
+## Usage
+- Code to generate the topics is in `script/run.ipynb`.
+- Prompts to generate the topics are in `prompt/`.
+
+## Results
+- The generated topics are in `data/master_paper_*.md`.
+- (Text/Generated topics) pairs are in `data/generation_1_paper.jsonl`.
@@ -0,0 +1,59 @@
+[0] Prompting Methods
+    [1] Adversarial Prompting in Language Models (Count: 32): Discusses the challenges and characteristics of adversarial prompts designed to circumvent safeguards in large language models.
+    [1] Decomposed Prompting (Count: 15): Discusses a method of breaking down complex tasks into simpler sub-tasks for Large Language Models (LLMs) using a modular prompting approach.
+    [1] Probabilistic Inference in Language Models (Count: 7): Discusses a method for enhancing reasoning in LLMs through a two-stage process involving retrieval of associations and probabilistic reasoning.
+    [1] Language Model Integration in Machine Learning Pipelines (Count: 6): Discusses the use of large language models as components within broader machine learning methods, such as boosting algorithms.
+    [1] Chain of thought prompting (Count: 22): Discusses a method that generates a sequence of short sentences to describe reasoning logics step by step.
+    [1] Automated Prompt Engineering (Count: 54): Discusses the use of algorithms, such as evolutionary algorithms, to automate the process of crafting prompts for large language models.
+    [1] Weak Supervision for Language Models (Count: 7): Discusses leveraging weak supervision on unlabeled data to enhance few-shot learning performance in aspect-based sentiment analysis tasks.
+    [1] Knowledge Distillation in Language Models (Count: 5): Discusses methods for transferring knowledge from larger to smaller language models to improve efficiency without compromising performance.
+    [1] Prompt Tuning in Language Models (Count: 7): Discusses the optimization and application of language model prompts for enhancing performance on specific tasks.
+    [1] Structured Prompting in Language Models (Count: 5): Discusses a method for scaling in-context learning in language models by using structured prompting to handle a larger number of examples without the constraints of input length.
+
+
+[0] Prompting Applications
+    [1] Safety and Robustness in Language Models (Count: 20): Discusses the balance between ensuring text safety and maintaining robustness in following instructions for large language models.
+    [1] Educational Use of Language Models (Count: 16): Discusses the application of language models in educational settings, particularly in cybersecurity training and assessment.
+    [1] Automated Evaluation of Dialogue Systems (Count: 19): Discusses the development of systems to automatically evaluate task-oriented dialogue systems using large language models.
+    [1] Language Model Adaptation for Speech Processing (Count: 11): Discusses methods for integrating speech recognition and understanding with large language models to enhance their capabilities in processing spoken language.
+    [1] Language Model Adaptation for Structured Information Extraction (Count: 42): Discusses the use of generative language models of code for performing structured information extraction tasks.
+    [1] Language Model Personalization (Count: 6): Discusses methods for aligning language model behavior with individual user preferences and characteristics, rather than demographic or ideological groups.
+    [1] Stereotype Detection and Mitigation in Language Models (Count: 6): Discusses methods for identifying and addressing stereotypes in language model outputs, particularly for intersectional demographic groups.
+    [1] Language Model Integration in Robotics (Count: 17): Discusses the use of language models to assist in robotic task planning and action sequence generation.
+    [1] Synthetic Data Generation for Language Models (Count: 25): Discusses the use of language models to create synthetic datasets, particularly for tasks with structured outputs.
+    [1] Language Model Adaptation for Machine Translation (Count: 28): Discusses the use of prompting techniques to enhance machine translation capabilities in language models, particularly for handling rare words and low-resource scenarios.
+    [1] Security and Vulnerability Assessment in Language Models (Count: 10): Discusses the identification and mitigation of security risks, such as Remote Code Execution vulnerabilities, in LLM-integrated frameworks and applications.
+    [1] Language Model Application in Biomedical Tasks (Count: 7): Discusses the use of large language models for specific applications in the biomedical domain, including classification and causal relation detection.
+    [1] Language Model Adaptation for Code Generation (Count: 27): Discusses the application of language models in generating semantically correct code and cross-language code clones for software development and programming tasks.
+    [1] Language Model Adaptation for Medical Imaging (Count: 6): Discusses the application and adaptation of visual-language pre-trained models for medical imaging tasks such as zero-shot nuclei detection.
+    [1] Language Model Adaptation for Robotics (Count: 6): Discusses the application of language models in programming robots and the associated challenges and strategies.
+    [1] Visual Prompt Engineering (Count: 16): Discusses methods and techniques for designing prompts that interact with large vision models for various visual tasks.
+    [1] Language Model Application in Healthcare (Count: 9): Discusses the evaluation and application of large language models in clinical and healthcare settings, focusing on their utility, safety, and the need for prompt engineering and model calibration.
+    [1] Bias Mitigation in Language Models (Count: 11): Discusses methods to assess and reduce biases in language model outputs, particularly in the context of generating job advertisements.
+    [1] Reinforcement Learning from Human Feedback in Language Models (Count: 7): Discusses the use of human preference data to train language models to improve their response quality without explicit rubrics.
+    [1] Explainable Natural Language Processing (Count: 5): Discusses the generation of natural language explanations by language models for data labels and the comparison of model-generated explanations with human-written ones, particularly in relation to sample hardness.
+    [1] Multimodal Learning in Language Models (Count: 20): Discusses methods for integrating and leveraging pre-trained unimodal models in multimodal vision-language tasks.
+    [1] Language Model Application in Recommendation Systems (Count: 6): Discusses the use of language models like ChatGPT to enhance recommendation systems and their ability to generalize across different recommendation scenarios.
+    [1] Language Model Application in Software Engineering (Count: 9): Discusses the use of large language models like ChatGPT in software engineering tasks, including log parsing and analytics.
+    [1] Multilingual Large Language Model Research (Count: 6): Discusses the capabilities and performance of multilingual large language models, particularly in tasks involving code-switching.
+    [1] Finetuning Language Models (Count: 12): Discusses the process of adapting language models to specific tasks or datasets through finetuning.
+    [1] Knowledge Base Construction with Language Models (Count: 5): Discusses the use of language models for building and enhancing knowledge bases.
+    [1] Interactive Text to Image Generation (Count: 5): Discusses the integration of language models with text-to-image diffusion models to enable interactive and natural language-driven image creation and refinement.
+    [1] Prompt Engineering (Count: 24): Explores techniques for effectively using prompts to guide language model behavior in various applications, contributing to the field of artificial intelligence, particularly in vision and natural language processing domains.
+    [1] Benchmarking Studies in Language Models (Count: 5): Discusses the need for comprehensive benchmarking of language models on complex tasks and the proposal of a taxonomy for prompt design to enable meaningful comparisons.
+    [1] Language Model Augmentation with External Knowledge (Count: 13): Discusses systems that enhance language models by incorporating external data or knowledge sources to improve the accuracy and reliability of their outputs.
+    [1] Multimodal Prompt Learning (Count: 9): Discusses the adaptation of vision-language models using prompts for both visual and textual inputs to improve task performance and representation alignment.
+    [1] Efficient Supervision in Language Model Training (Count: 8): Pertains to methods that aim to reduce the amount of human supervision required in training language models, such as using a small set of principles instead of extensive human annotations.
+    [1] Language Model Application in Mental Health (Count: 5): Discusses the use of large language models in generating empathetic responses for mental health counselling scenarios.
+    [1] Prompt Learning for Natural Language Understanding (Count: 5): Discusses the use of prompt learning techniques to improve tasks related to understanding and classifying relations in natural language, such as Implicit Discourse Relation Recognition (IDRR).
+    [1] Multimodal Large Language Model Research (Count: 5): Discusses the development and application of large language models that integrate multiple modalities, such as text and images, particularly in the context of medical image interpretation.
+    [1] Robustness in Language Models (Count: 6): Discusses the resilience of language models to various perturbations and the methods to test and improve their robustness.
+    [1] Language Model Adaptation for Multilingual Tasks (Count: 5): Covers the ability of language models to understand and generate content in multiple languages, including non-Latin scripts.
+    [1] Language Model Evaluation Benchmarks (Count: 12): Discusses the creation and use of benchmarks to evaluate the performance of language models, particularly in the context of few-shot learning.
+    [1] Language Model Efficiency and Scalability (Count: 6): Discusses the challenges and solutions related to the size and computational requirements of pre-trained language models (PLMs) for achieving both few-shot learning and fine-tuning capabilities without compromising model size.
+    [1] Meta Reinforcement Learning (Count: 6): Discusses the concept of learning to learn and adapting quickly to new tasks through meta-learning approaches in reinforcement learning.
+    [1] Retrieval Augmented Language Models (Count: 15): Discusses language models that incorporate external knowledge retrieval mechanisms to enhance performance on knowledge-intensive tasks.s
+    [1] Code Generation in Language Models (Count: 5): Pertains to the ability of language models to generate code and the methods to improve this capability.
+    [1] Fact Verification with Language Models (Count: 9): Discusses the use of language models for identifying and combating false news through fact-checking methods.
+    [1] Medical Dialogue Summarization (Count: 5): Discusses the process of condensing medical conversations into structured summaries, often involving the use of medical terminology and the extraction of key information from symptom discussions.
+