Improving Transform and Rerank Module #396

ritugala · 2024-04-02T03:37:47Z

Description

Major Changes:

Added support for using multiple datasets during transformation - this is helpful if a single dataset is too small by itself, or there are too many failing transforms from it. Have changed
Changed rerank module to first select dataset and then config. Also to use dataset tags
Use Minimum Bayes Risk Decoding (i.e. majority vote) while choosing dataset and config
Added task expansion module to better capture requirements of our task
Added support for parsing responses from openAI which had Chain of Thought followed by json (search for the rightmost json) - useful in code tasks which may have JSON in the COT also

Minor Changes/Fixes:

Removed dataset_index,json and reranking_dataset_index.json file from examples - this seems to have been mistakenly commited
Moved make_dataset_from_samples from prompt_based.py -> description_dataset_retriever.py
Moved most transform related variables (eg num_votes, max_failed_transforms etc) to be attributes of the DescriptionDatasetRetriever class so that we don't need to pass them around in functions
Added exponential backoff / parsing error message from OpenAI to see how long to sleep
Loggers would give duplicate messages sometimes, fixed that
Updated reranking_dataset_index_tiny.json (used in tests) for using tags
Updated tests in tests/dataset_retriever_test.py

Testing

Tested with dummy example + pytest

References

I've currently commented out dataset_transformer tests, want to make sure that you guys are happy with the current format before I add tests for it.
We should probably add CLI support for all the arguments (and set default values for most ofcourse)

saum7800

PR looks good! all important changes. Have left some comments. most minor, some might require little bit of work. Thanks!

prompt2model/dataset_retriever/description_dataset_retriever.py

prompt2model/dataset_transformer/prompt_based.py

prompt2model/utils/api_tools.py

neubig · 2024-04-11T14:14:49Z

Hi, I'm going to trust @saum7800 and @viswavi to take a look at this. It looks exciting, but it's a bit too voluminous for me to take a careful look now.

viswavi

Reviewed some of this PR - more to come tomorrow

prompt2model/dataset_retriever/description_dataset_retriever.py

viswavi · 2024-04-15T16:26:14Z

prompt2model/dataset_retriever/description_dataset_retriever.py

+        if dataset_name is None:
+            return None, None
+
+        time.sleep(10)  # To avoid rate limiting


I also don't love this - it makes a very specific hardcoded assumption about the rate limit. What if we instead use an on-premise LLM like Llama2 with no rate limit, or we use a version of GPT with a very high rate limit? Then this will probably no longer be required

prompt2model/dataset_retriever/description_dataset_retriever.py

viswavi

On a high-level, these changes look good. I've requested a bunch of small (but important) changes, mostly on minor implementation details, documentation, or code quality things

tests/dataset_transformer_test.py

prompt2model/dataset_retriever/reranking_prompt.py

prompt2model/dataset_transformer/prompt_based.py

prompt2model/dataset_retriever/description_dataset_retriever.py

Co-authored-by: Vijay Viswanathan <vijayv@andrew.cmu.edu>

viswavi

Looks good to me!

saum7800

Looks good to me too! Great job!

saum7800 and others added 30 commits January 18, 2024 13:18

add top_dataset_info_return

509f5d9

make reranking call from 16k context

90a4a04

add incontext and COT to dataset transformation

7193ec0

log selected columns

1ae1b0d

add peft training

71e129b

remove import

faa5b7b

two returns

c0e6ef6

create promptspec

9e40238

add logging

f4298b8

add logging

9c9fa75

add logging

665b546

pass text

e18989a

change params

5d9761e

change params

b5f094f

change params

1e169eb

change params

bf72344

change paths

758a6da

minor changes

7bfa104

remove arg

9e9f33e

clear cache

a12e062

add wandb changes and minor changes

d9dc1ab

change eval steps

63a4a75

change eval steps

931edf0

modify qlora params

a2c2cc6

make lr changes

41d6fd0

curr changes

d32dc22

delete saumya changes

5a93ae7

initial changes

921a98f

merging changes

88130ba

first pass refactoring

379a503

ritugala changed the title ~~Improving Transform and Execution Module~~ Improving Transform and Rerank Module Apr 2, 2024

ritugala and others added 2 commits April 2, 2024 19:00

fixing pytests

e0c0d0f

updated p2m_demo.py

6558279

ritugala requested a review from viswavi April 3, 2024 00:40

fix linting

15a61eb

ritugala requested a review from neubig April 5, 2024 14:01

saum7800 requested changes Apr 10, 2024

View reviewed changes

neubig removed their request for review April 11, 2024 14:14

PR changes

775da63

ritugala force-pushed the ritu-temporary-changes branch from 45704de to 775da63 Compare April 11, 2024 18:17

comment change

ad0d502

viswavi requested changes Apr 16, 2024

View reviewed changes

viswavi self-requested a review April 16, 2024 17:15

viswavi requested changes Apr 16, 2024

View reviewed changes

ritugala added 3 commits April 18, 2024 11:22

merged with main

4b6427b

PR changes and fixed test

95dcea0

fixed linting

b92dca2

ritugala marked this pull request as ready for review April 18, 2024 17:14

ritugala and others added 5 commits April 18, 2024 13:41

Update prompt2model/dataset_retriever/description_dataset_retriever.py

423510e

Co-authored-by: Vijay Viswanathan <vijayv@andrew.cmu.edu>

Update prompt2model/utils/parse_responses.py

92f127a

Co-authored-by: Vijay Viswanathan <vijayv@andrew.cmu.edu>

Update prompt2model/dataset_retriever/description_dataset_retriever.py

547d4a3

Co-authored-by: Vijay Viswanathan <vijayv@andrew.cmu.edu>

Update prompt2model/dataset_retriever/description_dataset_retriever.py

c2ba8b3

Co-authored-by: Vijay Viswanathan <vijayv@andrew.cmu.edu>

fixed minor linting

0c81d18

ritugala requested review from viswavi and saum7800 April 18, 2024 19:17

viswavi approved these changes Apr 19, 2024

View reviewed changes

saum7800 approved these changes Apr 19, 2024

View reviewed changes

ritugala merged commit 6709095 into main Apr 19, 2024
8 checks passed

ritugala deleted the ritu-temporary-changes branch April 19, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Transform and Rerank Module #396

Improving Transform and Rerank Module #396

ritugala commented Apr 2, 2024 •

edited

Loading

saum7800 left a comment

neubig commented Apr 11, 2024

viswavi left a comment

viswavi Apr 15, 2024

viswavi left a comment

viswavi left a comment •

edited

Loading

saum7800 left a comment

Improving Transform and Rerank Module #396

Improving Transform and Rerank Module #396

Conversation

ritugala commented Apr 2, 2024 • edited Loading

Description

Major Changes:

Minor Changes/Fixes:

Testing

References

saum7800 left a comment

Choose a reason for hiding this comment

neubig commented Apr 11, 2024

viswavi left a comment

Choose a reason for hiding this comment

viswavi Apr 15, 2024

Choose a reason for hiding this comment

viswavi left a comment

Choose a reason for hiding this comment

viswavi left a comment • edited Loading

Choose a reason for hiding this comment

saum7800 left a comment

Choose a reason for hiding this comment

ritugala commented Apr 2, 2024 •

edited

Loading

viswavi left a comment •

edited

Loading