Add Alpaca Persian Dataset by pourmand1376 · Pull Request #3633 · LAION-AI/Open-Assistant

pourmand1376 · 2023-08-04T09:45:25Z

Hi,
In the last two days, I have been working on translating alpaca into Persian (Farsi) and this is the result. I have reviewed the translations and they are in my opinion pretty good.

Also, the dataset is still translating on Kaggle and will be finished in a couple of days. I will update the datasets accordingly when the translation is complete.

I have added two datasets. One is instruction-based and one is orca-style dataset. For the first one, I knew how to add it. But I don't know how to add the orca dataset to your datasets.

Thank you for your attention.

stefangrotz · 2023-08-04T10:31:24Z

Hey great work, I always wanted translate this dataset to German or Esperanto. The main problem here is that the license of Alpaca isn't usable for Open Source LLMs because ChatGPT does not allow to use its output to train other models. Because of that it cannot be used for Open Assistant or for any commercial project.

However having this dataset surely is useful to train experimental systems and science projects.

BTW. do you know about the Alpaca Data Cleaned project? It fixed a lot of the errors in the dataset, like wrong calculations: https://github.com/gururise/AlpacaDataCleaned

pourmand1376 · 2023-08-04T10:59:47Z

Hey great work, I always wanted translate this dataset to German or Esperanto. The main problem here is that the license of Alpaca isn't usable for Open Source LLMs because ChatGPT does not allow to use its output to train other models. Because of that it cannot be used for Open Assistant or for any commercial project.

However having this dataset surely is useful to train experimental systems and science projects.

BTW. do you know about the Alpaca Data Cleaned project? It fixed a lot of the errors in the dataset, like wrong calculations: https://github.com/gururise/AlpacaDataCleaned

Hi, Thanks for your comment.

Yes, I have used the cleaned version.

Sadly, I didn't know about license restrictions. The dataset itself (Alapaca) is published under Apache 2.0. I have also published my dataset under Apache 2.0.

Isn't that good enough?

stefangrotz · 2023-08-04T11:52:40Z

Unfortunately not, see https://github.com/gururise/AlpacaDataCleaned#license
This is one of the main reasons why OA started to build up a crowd sourced conversational dataset.

Maybe you can translate the english and the spanish Open Assistant Dataset instead? Both are quite big.
https://huggingface.co/datasets/OpenAssistant/oasst1

pourmand1376 added 3 commits August 4, 2023 09:34

add alpaca

ef46dbf

add alpaca

3d269e4

add alpaca multi

c2166e4

pourmand1376 requested review from Vechtomov, andreaskoepf, bitplane, dvruette, huu4ontocord, jordiclive, olliestanley, sanagno, sedthh, shahules786, theblackcat102 and yk as code owners August 4, 2023 09:45

andreaskoepf added the data label Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Alpaca Persian Dataset#3633

Add Alpaca Persian Dataset#3633
pourmand1376 wants to merge 3 commits intoLAION-AI:mainfrom
pourmand1376:alpaca-fa

pourmand1376 commented Aug 4, 2023

Uh oh!

stefangrotz commented Aug 4, 2023 •

edited

Loading

Uh oh!

pourmand1376 commented Aug 4, 2023 •

edited

Loading

Uh oh!

stefangrotz commented Aug 4, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

pourmand1376 commented Aug 4, 2023

Uh oh!

stefangrotz commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pourmand1376 commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefangrotz commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stefangrotz commented Aug 4, 2023 •

edited

Loading

pourmand1376 commented Aug 4, 2023 •

edited

Loading

stefangrotz commented Aug 4, 2023 •

edited

Loading