Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline for integrating crowdsource prompts #6

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
01b8112
ignore .DS_Store
sbmaruf Apr 20, 2023
fe32003
data stat generator
sbmaruf Apr 20, 2023
76a150b
download data from google sheet.
sbmaruf Apr 20, 2023
8c691a7
update official masakhane/masakhanews
sbmaruf Apr 21, 2023
76a50d4
update jinja prompt loader
sbmaruf Apr 22, 2023
2f1f316
update data source
sbmaruf Apr 22, 2023
bbcb487
sanity check of scsqa structure
sbmaruf Apr 22, 2023
3f01f47
adding more datasets and output formatting
sbmaruf Apr 30, 2023
bf8f2c0
refactoring
sbmaruf Apr 30, 2023
e6488dd
doc string
sbmaruf Apr 30, 2023
7b9f1eb
add metadata
sbmaruf Apr 30, 2023
48244c0
prompt checker pipeline
sbmaruf May 9, 2023
0f159f9
type
sbmaruf May 9, 2023
2ef6ba1
code formatting & doc string added
sbmaruf May 9, 2023
a75519a
Add all dataset info
sbmaruf May 22, 2023
d778277
update naming
sbmaruf May 25, 2023
a9f210c
add split language
sbmaruf May 25, 2023
023b257
Automatic script running
sbmaruf May 25, 2023
57c0f32
gitignore updated
sbmaruf May 25, 2023
415bb29
formatting issue.
sbmaruf May 25, 2023
0529a48
update readme
sbmaruf May 25, 2023
29202a0
update --num-proc arg.
sbmaruf May 25, 2023
70355f3
ignore dump folder
sbmaruf May 29, 2023
e9fad7e
update hf-subset info
sbmaruf May 29, 2023
3bb89fe
black; 3 letter lang, len(data) condition added
sbmaruf May 29, 2023
5e8b4f3
one to one mapping between iso639-2 vs iso639-3
sbmaruf May 29, 2023
4c7cc11
runner
sbmaruf May 29, 2023
916ba1b
lang dicts added
sbmaruf May 29, 2023
4a8b323
script for creating audit data
sbmaruf May 29, 2023
31d7e5e
code re-factor
sbmaruf Jun 3, 2023
440b3f1
add help info
sbmaruf Jun 3, 2023
427d9a0
update truncation issue
sbmaruf Jun 3, 2023
f907524
cleaning
sbmaruf Jun 29, 2023
baf5326
handle exception
sbmaruf Jun 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,7 @@ dmypy.json

# Pyre type checker
.pyre/
.DS_Store

dumped*
.vscode/
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ This script translate the samsum dataset using the inference server
python main.py
```


## Translate

```shell
Expand All @@ -47,3 +48,11 @@ python -m instructmultilingual.translate \
--source_language="English" \
--target_language="Egyptian Arabic"
```


### Automatic Data Generation

Run the following script.
```
bash scripts/validate_and_generate.sh
```
Loading