New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

added datasets and models for text generation evaluation #291

Open

ashish3586 wants to merge 1 commit into main from feature/text_generation_leaderboard

Collaborator

ashish3586 commented Sep 17, 2021

No description provided.


          added dataset and model name for text generation evaluation

c7f172e

ashish3586 requested review from tongshuangwu and kaustubhdhole

September 17, 2021 14:26

Collaborator

tongshuangwu commented Sep 17, 2021

Thanks, this is great! Can you also run these evals and add the numbers to the leaderboard readme?

kaustubhdhole requested review from mille-s, Saad-Mahamood and sebastianGehrmann

September 27, 2021 19:29

Contributor

mille-s commented Sep 28, 2021

@ashish3586 can you please provide a short description of your transformation?

Saad-Mahamood requested changes

View reviewed changes

Collaborator

Saad-Mahamood left a comment

Minor change required. Just add the DocStrings for the input and return parameters for each of the functions.

aadesh11 reviewed

View reviewed changes

evaluation/evaluate_text_generation.py

+                      dataset = KeyValueDataset.from_huggingface(
+                          hf_dataset, TaskType.TEXT_TO_TEXT_GENERATION, ["text", "summary"]
+                      )

Collaborator

aadesh11 Oct 29, 2021

Missing return statement for "billsum".

aadesh11 reviewed

View reviewed changes

evaluation/evaluate_text_generation.py

+                      dataset = KeyValueDataset.from_huggingface(
+                          hf_dataset, TaskType.TEXT_TO_TEXT_GENERATION, ["text", "summary"]
+                      )

Collaborator

aadesh11 Oct 29, 2021

Also, I would suggest adding the 'else' block and raising exceptions with the proper message.

aadesh11 reviewed

View reviewed changes

evaluation/evaluate_text_generation.py

-                      "summarization", model=model_name, tokenizer=model_name
+                      "summarization", model=model_name, tokenizer=model_name, device=0 if is_cuda else -1)
+                  #percent = f"[{split.split('[')[-1]}" if "[" in split else ""
+                  #if dataset_name == "wikihow": split = "all[:1%]"  # f"all{percent}"

Collaborator

aadesh11 Oct 29, 2021

I think we can remove this commented code.

aadesh11 reviewed

View reviewed changes

evaluation/evaluate_text_generation.py

+                  #if dataset_name == "wikihow": split = "all[:1%]"  # f"all{percent}"
+                  dataset = _process_data(dataset_name, split)
+                  print(

Collaborator

aadesh11 Oct 29, 2021

Duplicate print statement.

aadesh11 reviewed

View reviewed changes

evaluation/evaluate_text_generation.py

                   references = []
                   raw_hypotheses = []
                   print(f"Length of Evaluation dataset is {len(dataset)}")
-                  for example in dataset:
+                  for i,example in enumerate(dataset):
+                      print(i)

Collaborator

aadesh11 Oct 29, 2021

Do we need this print statement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

aadesh11 aadesh11 left review comments

Saad-Mahamood Saad-Mahamood requested changes

tongshuangwu Awaiting requested review from tongshuangwu

kaustubhdhole Awaiting requested review from kaustubhdhole

mille-s Awaiting requested review from mille-s

sebastianGehrmann Awaiting requested review from sebastianGehrmann

Requested changes must be addressed to merge this pull request.

Labels

None yet