Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline for integrating crowdsource prompts #6

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
01b8112
ignore .DS_Store
sbmaruf Apr 20, 2023
fe32003
data stat generator
sbmaruf Apr 20, 2023
76a150b
download data from google sheet.
sbmaruf Apr 20, 2023
8c691a7
update official masakhane/masakhanews
sbmaruf Apr 21, 2023
76a50d4
update jinja prompt loader
sbmaruf Apr 22, 2023
2f1f316
update data source
sbmaruf Apr 22, 2023
bbcb487
sanity check of scsqa structure
sbmaruf Apr 22, 2023
3f01f47
adding more datasets and output formatting
sbmaruf Apr 30, 2023
bf8f2c0
refactoring
sbmaruf Apr 30, 2023
e6488dd
doc string
sbmaruf Apr 30, 2023
7b9f1eb
add metadata
sbmaruf Apr 30, 2023
48244c0
prompt checker pipeline
sbmaruf May 9, 2023
0f159f9
type
sbmaruf May 9, 2023
2ef6ba1
code formatting & doc string added
sbmaruf May 9, 2023
a75519a
Add all dataset info
sbmaruf May 22, 2023
d778277
update naming
sbmaruf May 25, 2023
a9f210c
add split language
sbmaruf May 25, 2023
023b257
Automatic script running
sbmaruf May 25, 2023
57c0f32
gitignore updated
sbmaruf May 25, 2023
415bb29
formatting issue.
sbmaruf May 25, 2023
0529a48
update readme
sbmaruf May 25, 2023
29202a0
update --num-proc arg.
sbmaruf May 25, 2023
70355f3
ignore dump folder
sbmaruf May 29, 2023
e9fad7e
update hf-subset info
sbmaruf May 29, 2023
3bb89fe
black; 3 letter lang, len(data) condition added
sbmaruf May 29, 2023
5e8b4f3
one to one mapping between iso639-2 vs iso639-3
sbmaruf May 29, 2023
4c7cc11
runner
sbmaruf May 29, 2023
916ba1b
lang dicts added
sbmaruf May 29, 2023
4a8b323
script for creating audit data
sbmaruf May 29, 2023
31d7e5e
code re-factor
sbmaruf Jun 3, 2023
440b3f1
add help info
sbmaruf Jun 3, 2023
427d9a0
update truncation issue
sbmaruf Jun 3, 2023
f907524
cleaning
sbmaruf Jun 29, 2023
baf5326
handle exception
sbmaruf Jun 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,4 @@ dmypy.json

# Pyre type checker
.pyre/
.DS_Store
58 changes: 58 additions & 0 deletions data/check_prompts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import os
import csv
import json
import argparse
import subprocess
from promptsource.templates import Template
from .data_stat import SERIES_A_DATASET_NAME_DICT

def check(
json_example,
template_name,
jinja_template,
template_reference=None,
answer_choices=None
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring please :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But still WIP.

json_example = json.loads(json_example)
template = Template(
template_name,
jinja_template,
template_reference,
answer_choices=answer_choices
)
lm_io = template.apply(json_example, highlight_variables=False)
return lm_io

def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--form_path",
type=str,
default=None,
help="Path of the google sheet."
)
parser.add_argument(
"--overwrite",
action="store_true",
help="Overwrite eexisting prompt file prompts.csv."
)
parser.add_argument(
"--prompt-dir",
type=str,
default="data/",
help="Overwrite eexisting prompt file prompts.csv."
)
args = parser.parse_args()
prompt_file_path = f"{args.prompt_dir}/prompts.csv"
if os.path.exists(prompt_file_path) and args.overwrite: # if file exists, it may be from prev. run/download.
subprocess.check_output(f"mv {prompt_file_path} {prompt_file_path}.old", shell=True)
subprocess.check_output("curl -L https://docs.google.com/spreadsheets/d/10bCwOhM8zKNkqKi54gIvdwrR44YlWQFV9fpGm7acHv8/export?format=csv > ./data/prompts.csv", shell=True)

with open('data/prompts.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
next(iter(csvreader))
for row in csvreader:
print(row)

if __name__ == "__main__":
main()
Loading