Add cli demo (#221)

* use zeno in a batch * add gc.collect() * add error control * merge test files for trainer * add tests for t5 * add zeno's new release * add multi-generation and debug * [set base parameter] * add requests_per_minute and responses_per_requests * set max_length to 500 * delecte wrong max_new_tokens in executor * distinguish high-equality and low-equality examples for generator * add and test construct_input_output_map * add and dataset test tools * add multi_vote_input_output_map * add convert_generated_examples_to_dataset * add data member for generator * add test_convert_generated_examples_to_generated_dataset * add test_convert_generated_examples_to_generated_dataset when filter_duplicated_examples=False * add test_compute_batch_size * strict constrains on default values of input_output_map, generated_dataset, generated_examples * finish tests for filter_duplicated_examples=False * rename MockCompletion * add cache files * add cache method * add two tests for cache method * add mock_batch_openai_response_with_different_completions * add mock_batch_openai_response_with_different_completions [my py failed] * ban mypy * ban mypy * fix mock_batch_openai_response_with_different_completions * finish unit test for filter * use default dict for input_output_map * rewrite docs for construct_input_output_map * rewrite docs for use_multi_vote * rename to apply_multi_vote_to_construct_generated_dataset * add test for extract response * add test for extract response * split generator tests * split generator tests and pass tests * finish test_generator_with_filter * fix import error * fix load cache * assert log info for cache * test load cache and continue generation * change example to Example * add demonstration for meta prompts * use gradio 3.38 * use gradio 3.38 * use fast api 100 * use IGNORE_INDEX instead of magic number -100 * fix typo of #174 * add new documents * add new documents for generator * add new documents for mock openAI * refactor before five filter tests * rewrite docs for filter test * rewrite all the doc string's * merge main. Fix readme * add new member variables * add input_output_map and multi-vote * add convert_generated_examples_to_generated_dataset and compute_batch_size * rewrite generate_dataset_split and extract_responses * Add mock_batch_openai_response_with_different_completions * Add mock_batch_openai_response_with_different_completions * test error handler of filter generator * test load cache of filter * test the generator with filter * decouple * add new cache * add new cache * add new cache for generated dataset and examples * fix typo * use generalized class of transformers * filter empty examples in processor and generator * move assertion * fix typo * fix typo * add instruction * add instruction * increase sequence length * add batch size * use mutable dataclass * use temperature increase * use clipped_temperature * delete comments * add max_temperature * fix 161 * fix processor * fix wrong unit tests * fix wrong unit tests * change logger * change logger comments * add cli demo * add pyfiglet * add termcolor * add reslut * add num_epochs * fix epoch * [CUDA] * add batch size * add batch size * finish cli demo * change the labeling logics * add truncation warning for executor * use execption in dataset generator * use execption in dataset generator * merge main * merge main * update docstring Co-authored-by: Graham Neubig <neubig@gmail.com> * fix review from graham * Update tests/dataset_generator_with_filter_test.py Co-authored-by: Graham Neubig <neubig@gmail.com> * use none stateful function * use none stateful function for extract responses * add new comments * fix grammar * Update test_helpers/dataset_tools.py Co-authored-by: Graham Neubig <neubig@gmail.com> * Update test_helpers/dataset_tools.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix lint * fix wrong comments * Update prompt2model/dataset_generator/openai_gpt.py Co-authored-by: Graham Neubig <neubig@gmail.com> * use unstateful methods * fix lint * fix lint * fix review * fix type check and use class * fix confilicts * remove result * fix conflict * fix review * fix review * fix review * add component document * new main readme * new main readme [need video] * fix processor * fix alignemnt of different features * fix processor * add instruction for prompt * fix review --------- Co-authored-by: zhaochen20 <zhaochenyang20@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com>
neulab · Aug 23, 2023 · 0bea36a · 0bea36a
1 parent 99af45a
commit 0bea36a
Show file tree

Hide file tree

Showing 33 changed files with 855 additions and 822 deletions.
diff --git a/README.md b/README.md
@@ -26,29 +26,28 @@ pip install .
 
 ## Configuration
 
-Before using `prompt2model`, there are a
+Before using `prompt2model`, there is a
 few configuration steps you need to complete:
 
 - Sign up on the OpenAI website and obtain an
 OpenAI API key.
 
 - Provide OpenAI API key in the
- initialization function of the
-   `DatasetGenerator` and `OpenAIInstructionParser`
-  classes.
-
+initialization function of the classes which
+requires calling OpenAI Models.
+
 - Alternatively, you can set
-  the environment variable
-   `OPENAI_API_KEY` to your API key by running
-  the following command in your terminal:
+the environment variable
+`OPENAI_API_KEY` to your API key by running
+the following command in your terminal:
 
 ```bash
 export OPENAI_API_KEY=<your key>
 ```
 
 - After setting the environment
- Variable `OPENAI_API_KEY`, just
-  reference  load it in your Python:
+Variable `OPENAI_API_KEY`, just
+reference  load it in your Python:
 
 ```python
 import os
@@ -57,6 +56,13 @@ import openai
 openai.api_key = os.environ["OPENAI_API_KEY"]
 ```
 
+To enable the model retriever, we need to untar the model_info.tgz file:
+
+```bash
+cd huggingface_models
+tar -xvf model_info.tgz
+```
+
 ## Components
 
 The `prompt2model` package is composed
@@ -73,41 +79,65 @@ instructions on maximizing the
 functionality and benefits of each
 component within the package.
 
-## Usage[TODO]
+## Usage
 
 The `prompt2model` pipeline is a versatile
 pipeline for task solving using a language
 model. It covers stages including dataset retrieval,
 generation, processing, model retrieval,
 training, execution, evaluation, and
 interface creation. The
-`./prompt2model/run_skeleton.py`
-script executes the full pipeline.
-The pipeline utilizes
-mock components for demonstration, but
-in real-world scenarios, actual implementations
-should replace them. By following
-this pipeline, users can efficiently
+`.cli_demo.py`
+By directly run `python cli_demo.py`,
+users can efficiently
 leverage language models for various tasks
 by customizing the components according to
 their specific requirements.
 
+## How to Write a Good Prompt
+
+A good prompt can make the generated dataset
+follow exactly the format of demonstrations.
+It contains the instruction and few-shot examples.
+
+The instruction should contain the following:
+
+1. The exact format description for the input
+and output, i.e., a string, a dictionary, or whatever.
+2. The exact contents of each part of the
+input and their relationship as possible as you can.
+3. The range of possible input. For example,
+"And the question can range from Math, Cultural,
+Social, Geometry, Biology, History, Sports, Technology,
+Science, and so on."
+
+The few-shot examples should contain the following:
+
+1. Use `=` rather than other ambiguous symbols like `:`.
+2. Avoid unnecessary line breaks at the beginning.
+For example, `input=""` is better than breaking
+the line after `=`.
+3. Use `input` rather than `Input`, `ouput` is
+preferable likewise.
+4. Warp the `input` and `output` into a string with `“”`.
+
+Though the examples are optional, we strongly
+suggest including them to guide the format and
+content for the generator.
+
+Also, we recommend providing several precise examples
+in the specified format and inquiring with ChatGPT
+about the format and scope of your examples.
+
+## Customization
+
+If you want to customize a specific component,
+see the relevant doc page and class document string.
+
 ## Contribution
 
 If you're interested in contributing
 to the `prompt2model` project, please
 refer to the [CONTRIBUTING.md](CONTRIBUTING.md)
 file for detailed guidelines and
 information tailored specifically
-
-## for developers
-
-There is more information for developers in the [CONTRIBUTING.md](CONTRIBUTING.md)
-file.
-
-To enable the model retriever, we need to untar the model_info.tgz file:
-
-```bash
-cd huggingface_models
-tar -xvf model_info.tgz
-```