Skip to content

Commit 3a373e9

Browse files
authored
Merge pull request #122 from trigaten/topicgpt
added scripts + results for topicgpt
2 parents cec236f + 9283946 commit 3a373e9

File tree

11 files changed

+4114
-0
lines changed

11 files changed

+4114
-0
lines changed

topicgpt/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## TopicGPT
2+
3+
### Setup
4+
- Set your API key in an environment variable called `OPENAI_API_KEY`, or directly in the script/utils.py file.
5+
- Install the requirements: `pip install -r requirements.txt`
6+
7+
## Usage
8+
- Code to generate the topics is in `script/run.ipynb`.
9+
- Prompts to generate the topics are in `prompt/`.
10+
11+
## Results
12+
- The generated topics are in `data/master_paper_*.md`.
13+
- (Text/Generated topics) pairs are in `data/generation_1_paper.jsonl`.

topicgpt/data/generation_1_paper.jsonl

Lines changed: 1464 additions & 0 deletions
Large diffs are not rendered by default.

topicgpt/data/master_paper.md

Lines changed: 464 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
[0] Prompting Methods
2+
[1] Adversarial Prompting in Language Models (Count: 32): Discusses the challenges and characteristics of adversarial prompts designed to circumvent safeguards in large language models.
3+
[1] Decomposed Prompting (Count: 15): Discusses a method of breaking down complex tasks into simpler sub-tasks for Large Language Models (LLMs) using a modular prompting approach.
4+
[1] Probabilistic Inference in Language Models (Count: 7): Discusses a method for enhancing reasoning in LLMs through a two-stage process involving retrieval of associations and probabilistic reasoning.
5+
[1] Language Model Integration in Machine Learning Pipelines (Count: 6): Discusses the use of large language models as components within broader machine learning methods, such as boosting algorithms.
6+
[1] Chain of thought prompting (Count: 22): Discusses a method that generates a sequence of short sentences to describe reasoning logics step by step.
7+
[1] Automated Prompt Engineering (Count: 54): Discusses the use of algorithms, such as evolutionary algorithms, to automate the process of crafting prompts for large language models.
8+
[1] Weak Supervision for Language Models (Count: 7): Discusses leveraging weak supervision on unlabeled data to enhance few-shot learning performance in aspect-based sentiment analysis tasks.
9+
[1] Knowledge Distillation in Language Models (Count: 5): Discusses methods for transferring knowledge from larger to smaller language models to improve efficiency without compromising performance.
10+
[1] Prompt Tuning in Language Models (Count: 7): Discusses the optimization and application of language model prompts for enhancing performance on specific tasks.
11+
[1] Structured Prompting in Language Models (Count: 5): Discusses a method for scaling in-context learning in language models by using structured prompting to handle a larger number of examples without the constraints of input length.
12+
13+
14+
[0] Prompting Applications
15+
[1] Safety and Robustness in Language Models (Count: 20): Discusses the balance between ensuring text safety and maintaining robustness in following instructions for large language models.
16+
[1] Educational Use of Language Models (Count: 16): Discusses the application of language models in educational settings, particularly in cybersecurity training and assessment.
17+
[1] Automated Evaluation of Dialogue Systems (Count: 19): Discusses the development of systems to automatically evaluate task-oriented dialogue systems using large language models.
18+
[1] Language Model Adaptation for Speech Processing (Count: 11): Discusses methods for integrating speech recognition and understanding with large language models to enhance their capabilities in processing spoken language.
19+
[1] Language Model Adaptation for Structured Information Extraction (Count: 42): Discusses the use of generative language models of code for performing structured information extraction tasks.
20+
[1] Language Model Personalization (Count: 6): Discusses methods for aligning language model behavior with individual user preferences and characteristics, rather than demographic or ideological groups.
21+
[1] Stereotype Detection and Mitigation in Language Models (Count: 6): Discusses methods for identifying and addressing stereotypes in language model outputs, particularly for intersectional demographic groups.
22+
[1] Language Model Integration in Robotics (Count: 17): Discusses the use of language models to assist in robotic task planning and action sequence generation.
23+
[1] Synthetic Data Generation for Language Models (Count: 25): Discusses the use of language models to create synthetic datasets, particularly for tasks with structured outputs.
24+
[1] Language Model Adaptation for Machine Translation (Count: 28): Discusses the use of prompting techniques to enhance machine translation capabilities in language models, particularly for handling rare words and low-resource scenarios.
25+
[1] Security and Vulnerability Assessment in Language Models (Count: 10): Discusses the identification and mitigation of security risks, such as Remote Code Execution vulnerabilities, in LLM-integrated frameworks and applications.
26+
[1] Language Model Application in Biomedical Tasks (Count: 7): Discusses the use of large language models for specific applications in the biomedical domain, including classification and causal relation detection.
27+
[1] Language Model Adaptation for Code Generation (Count: 27): Discusses the application of language models in generating semantically correct code and cross-language code clones for software development and programming tasks.
28+
[1] Language Model Adaptation for Medical Imaging (Count: 6): Discusses the application and adaptation of visual-language pre-trained models for medical imaging tasks such as zero-shot nuclei detection.
29+
[1] Language Model Adaptation for Robotics (Count: 6): Discusses the application of language models in programming robots and the associated challenges and strategies.
30+
[1] Visual Prompt Engineering (Count: 16): Discusses methods and techniques for designing prompts that interact with large vision models for various visual tasks.
31+
[1] Language Model Application in Healthcare (Count: 9): Discusses the evaluation and application of large language models in clinical and healthcare settings, focusing on their utility, safety, and the need for prompt engineering and model calibration.
32+
[1] Bias Mitigation in Language Models (Count: 11): Discusses methods to assess and reduce biases in language model outputs, particularly in the context of generating job advertisements.
33+
[1] Reinforcement Learning from Human Feedback in Language Models (Count: 7): Discusses the use of human preference data to train language models to improve their response quality without explicit rubrics.
34+
[1] Explainable Natural Language Processing (Count: 5): Discusses the generation of natural language explanations by language models for data labels and the comparison of model-generated explanations with human-written ones, particularly in relation to sample hardness.
35+
[1] Multimodal Learning in Language Models (Count: 20): Discusses methods for integrating and leveraging pre-trained unimodal models in multimodal vision-language tasks.
36+
[1] Language Model Application in Recommendation Systems (Count: 6): Discusses the use of language models like ChatGPT to enhance recommendation systems and their ability to generalize across different recommendation scenarios.
37+
[1] Language Model Application in Software Engineering (Count: 9): Discusses the use of large language models like ChatGPT in software engineering tasks, including log parsing and analytics.
38+
[1] Multilingual Large Language Model Research (Count: 6): Discusses the capabilities and performance of multilingual large language models, particularly in tasks involving code-switching.
39+
[1] Finetuning Language Models (Count: 12): Discusses the process of adapting language models to specific tasks or datasets through finetuning.
40+
[1] Knowledge Base Construction with Language Models (Count: 5): Discusses the use of language models for building and enhancing knowledge bases.
41+
[1] Interactive Text to Image Generation (Count: 5): Discusses the integration of language models with text-to-image diffusion models to enable interactive and natural language-driven image creation and refinement.
42+
[1] Prompt Engineering (Count: 24): Explores techniques for effectively using prompts to guide language model behavior in various applications, contributing to the field of artificial intelligence, particularly in vision and natural language processing domains.
43+
[1] Benchmarking Studies in Language Models (Count: 5): Discusses the need for comprehensive benchmarking of language models on complex tasks and the proposal of a taxonomy for prompt design to enable meaningful comparisons.
44+
[1] Language Model Augmentation with External Knowledge (Count: 13): Discusses systems that enhance language models by incorporating external data or knowledge sources to improve the accuracy and reliability of their outputs.
45+
[1] Multimodal Prompt Learning (Count: 9): Discusses the adaptation of vision-language models using prompts for both visual and textual inputs to improve task performance and representation alignment.
46+
[1] Efficient Supervision in Language Model Training (Count: 8): Pertains to methods that aim to reduce the amount of human supervision required in training language models, such as using a small set of principles instead of extensive human annotations.
47+
[1] Language Model Application in Mental Health (Count: 5): Discusses the use of large language models in generating empathetic responses for mental health counselling scenarios.
48+
[1] Prompt Learning for Natural Language Understanding (Count: 5): Discusses the use of prompt learning techniques to improve tasks related to understanding and classifying relations in natural language, such as Implicit Discourse Relation Recognition (IDRR).
49+
[1] Multimodal Large Language Model Research (Count: 5): Discusses the development and application of large language models that integrate multiple modalities, such as text and images, particularly in the context of medical image interpretation.
50+
[1] Robustness in Language Models (Count: 6): Discusses the resilience of language models to various perturbations and the methods to test and improve their robustness.
51+
[1] Language Model Adaptation for Multilingual Tasks (Count: 5): Covers the ability of language models to understand and generate content in multiple languages, including non-Latin scripts.
52+
[1] Language Model Evaluation Benchmarks (Count: 12): Discusses the creation and use of benchmarks to evaluate the performance of language models, particularly in the context of few-shot learning.
53+
[1] Language Model Efficiency and Scalability (Count: 6): Discusses the challenges and solutions related to the size and computational requirements of pre-trained language models (PLMs) for achieving both few-shot learning and fine-tuning capabilities without compromising model size.
54+
[1] Meta Reinforcement Learning (Count: 6): Discusses the concept of learning to learn and adapting quickly to new tasks through meta-learning approaches in reinforcement learning.
55+
[1] Retrieval Augmented Language Models (Count: 15): Discusses language models that incorporate external knowledge retrieval mechanisms to enhance performance on knowledge-intensive tasks.s
56+
[1] Code Generation in Language Models (Count: 5): Pertains to the ability of language models to generate code and the methods to improve this capability.
57+
[1] Fact Verification with Language Models (Count: 9): Discusses the use of language models for identifying and combating false news through fact-checking methods.
58+
[1] Medical Dialogue Summarization (Count: 5): Discusses the process of condensing medical conversations into structured summaries, often involving the use of medical terminology and the extraction of key information from symptom discussions.
59+

0 commit comments

Comments
 (0)