Skip to content

Commit

Permalink
Add cli demo (#221)
Browse files Browse the repository at this point in the history
* use zeno in a batch

* add gc.collect()

* add error control

* merge test files for trainer

* add tests for t5

* add zeno's new release

* add multi-generation and debug

* [set base parameter]

* add requests_per_minute and responses_per_requests

* set max_length to 500

* delecte wrong max_new_tokens in executor

* distinguish high-equality and low-equality examples for generator

* add and test construct_input_output_map

* add and dataset test tools

* add multi_vote_input_output_map

* add convert_generated_examples_to_dataset

* add data member for generator

* add test_convert_generated_examples_to_generated_dataset

* add test_convert_generated_examples_to_generated_dataset when filter_duplicated_examples=False

* add test_compute_batch_size

* strict constrains on default values of input_output_map, generated_dataset, generated_examples

* finish tests for filter_duplicated_examples=False

* rename MockCompletion

* add cache files

* add cache method

* add two tests for cache method

* add mock_batch_openai_response_with_different_completions

* add mock_batch_openai_response_with_different_completions [my py failed]

* ban mypy

* ban mypy

* fix mock_batch_openai_response_with_different_completions

* finish unit test for filter

* use default dict for input_output_map

* rewrite docs for construct_input_output_map

* rewrite docs for use_multi_vote

* rename to apply_multi_vote_to_construct_generated_dataset

* add test for extract response

* add test for extract response

* split generator tests

* split generator tests and pass tests

* finish test_generator_with_filter

* fix import error

* fix load cache

* assert log info for cache

* test load cache and continue generation

* change example to Example

* add demonstration for meta prompts

* use gradio 3.38

* use gradio 3.38

* use fast api 100

* use IGNORE_INDEX instead of magic number -100

* fix typo of #174

* add new documents

* add new documents for generator

* add new documents for mock openAI

* refactor before five filter tests

* rewrite docs for filter test

* rewrite all the doc string's

* merge main. Fix readme

* add new member variables

* add input_output_map and multi-vote

* add convert_generated_examples_to_generated_dataset and compute_batch_size

* rewrite generate_dataset_split and extract_responses

* Add mock_batch_openai_response_with_different_completions

* Add mock_batch_openai_response_with_different_completions

* test error handler of filter generator

* test load cache of filter

* test the generator with filter

* decouple

* add new cache

* add new cache

* add new cache for generated dataset and examples

* fix typo

* use generalized class of transformers

* filter empty examples in processor and generator

* move assertion

* fix typo

* fix typo

* add instruction

* add instruction

* increase sequence length

* add batch size

* use mutable dataclass

* use temperature increase

* use clipped_temperature

* delete comments

* add max_temperature

* fix 161

* fix processor

* fix wrong unit tests

* fix wrong unit tests

* change logger

* change logger comments

* add cli demo

* add pyfiglet

* add termcolor

* add reslut

* add num_epochs

* fix epoch

* [CUDA]

* add batch size

* add batch size

* finish cli demo

* change the labeling logics

* add truncation warning for executor

* use execption in dataset generator

* use execption in dataset generator

* merge main

* merge main

* update docstring

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix review from graham

* Update tests/dataset_generator_with_filter_test.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* use none stateful function

* use none stateful function for extract responses

* add new comments

* fix grammar

* Update test_helpers/dataset_tools.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update test_helpers/dataset_tools.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix lint

* fix wrong comments

* Update prompt2model/dataset_generator/openai_gpt.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* use unstateful methods

* fix lint

* fix lint

* fix review

* fix type check and use class

* fix confilicts

* remove result

* fix conflict

* fix review

* fix review

* fix review

* add component document

* new main readme

* new main readme [need video]

* fix processor

* fix alignemnt of different features

* fix processor

* add instruction for prompt

* fix review

---------

Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
  • Loading branch information
3 people authored Aug 23, 2023
1 parent 99af45a commit 0bea36a
Show file tree
Hide file tree
Showing 33 changed files with 855 additions and 822 deletions.
90 changes: 60 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,29 +26,28 @@ pip install .

## Configuration

Before using `prompt2model`, there are a
Before using `prompt2model`, there is a
few configuration steps you need to complete:

- Sign up on the OpenAI website and obtain an
OpenAI API key.

- Provide OpenAI API key in the
initialization function of the
`DatasetGenerator` and `OpenAIInstructionParser`
classes.

initialization function of the classes which
requires calling OpenAI Models.

- Alternatively, you can set
the environment variable
`OPENAI_API_KEY` to your API key by running
the following command in your terminal:
the environment variable
`OPENAI_API_KEY` to your API key by running
the following command in your terminal:

```bash
export OPENAI_API_KEY=<your key>
```

- After setting the environment
Variable `OPENAI_API_KEY`, just
reference load it in your Python:
Variable `OPENAI_API_KEY`, just
reference load it in your Python:

```python
import os
Expand All @@ -57,6 +56,13 @@ import openai
openai.api_key = os.environ["OPENAI_API_KEY"]
```

To enable the model retriever, we need to untar the model_info.tgz file:

```bash
cd huggingface_models
tar -xvf model_info.tgz
```

## Components

The `prompt2model` package is composed
Expand All @@ -73,41 +79,65 @@ instructions on maximizing the
functionality and benefits of each
component within the package.

## Usage[TODO]
## Usage

The `prompt2model` pipeline is a versatile
pipeline for task solving using a language
model. It covers stages including dataset retrieval,
generation, processing, model retrieval,
training, execution, evaluation, and
interface creation. The
`./prompt2model/run_skeleton.py`
script executes the full pipeline.
The pipeline utilizes
mock components for demonstration, but
in real-world scenarios, actual implementations
should replace them. By following
this pipeline, users can efficiently
`.cli_demo.py`
By directly run `python cli_demo.py`,
users can efficiently
leverage language models for various tasks
by customizing the components according to
their specific requirements.

## How to Write a Good Prompt

A good prompt can make the generated dataset
follow exactly the format of demonstrations.
It contains the instruction and few-shot examples.

The instruction should contain the following:

1. The exact format description for the input
and output, i.e., a string, a dictionary, or whatever.
2. The exact contents of each part of the
input and their relationship as possible as you can.
3. The range of possible input. For example,
"And the question can range from Math, Cultural,
Social, Geometry, Biology, History, Sports, Technology,
Science, and so on."

The few-shot examples should contain the following:

1. Use `=` rather than other ambiguous symbols like `:`.
2. Avoid unnecessary line breaks at the beginning.
For example, `input=""` is better than breaking
the line after `=`.
3. Use `input` rather than `Input`, `ouput` is
preferable likewise.
4. Warp the `input` and `output` into a string with `“”`.

Though the examples are optional, we strongly
suggest including them to guide the format and
content for the generator.

Also, we recommend providing several precise examples
in the specified format and inquiring with ChatGPT
about the format and scope of your examples.

## Customization

If you want to customize a specific component,
see the relevant doc page and class document string.

## Contribution

If you're interested in contributing
to the `prompt2model` project, please
refer to the [CONTRIBUTING.md](CONTRIBUTING.md)
file for detailed guidelines and
information tailored specifically

## for developers

There is more information for developers in the [CONTRIBUTING.md](CONTRIBUTING.md)
file.

To enable the model retriever, we need to untar the model_info.tgz file:

```bash
cd huggingface_models
tar -xvf model_info.tgz
```
Loading

0 comments on commit 0bea36a

Please sign in to comment.