Format conversion tools for post tuning datasets #514

HYLcool · 2024-12-18T13:51:20Z

Select Query-Response format as the intermediate format for Data-Juicer.

…ormat conversion tools

yxdyc

The prototype implementation LGTM. Later we may need to discuss some terminologies and improve the clarity of the docs.

HYLcool added 3 commits December 18, 2024 20:29

+ add sharegpt <--> dj format conversion tools

5f1ab59

- move multimodal into fmt_conversion

d878eae

+ add basic docs for format conversion tools and post tuning dialog f…

d21833d

…ormat conversion tools

HYLcool added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 18, 2024

HYLcool requested review from BeachWang, Cathy0908 and yxdyc December 18, 2024 13:51

HYLcool self-assigned this Dec 18, 2024

HYLcool requested a deployment to Testing December 18, 2024 13:51 — with GitHub Actions Waiting

yxdyc reviewed Dec 19, 2024

View reviewed changes

yxdyc mentioned this pull request Dec 19, 2024

sharegpt format support #488

Open

3 tasks

yxdyc linked an issue Dec 19, 2024 that may be closed by this pull request

sharegpt format support #488

Open

3 tasks

HYLcool added 5 commits December 20, 2024 10:50

* rename tools

cb9f56f

+ add messages <--> dj conversion tools

5d7cd04

+ add messages <--> dj conversion tools

e0bd573

- reorganize the directory

6936f23

* rename functions

38b5619

HYLcool requested a deployment to Testing December 20, 2024 08:50 — with GitHub Actions Waiting

HYLcool added 2 commits December 23, 2024 10:41

Merge branch 'main' into feat/ft_format_conv_tools

d6148dd

+ add conversion tools for ModelScope-Swift ShareGPT format

128f1ab

HYLcool requested a deployment to Testing December 23, 2024 12:46 — with GitHub Actions Waiting

+ add conversion tools for Alpaca format

7519a65

HYLcool requested a deployment to Testing December 23, 2024 13:03 — with GitHub Actions Waiting

* fix typos in doc strings

6a51521

HYLcool temporarily deployed to Testing December 24, 2024 03:18 — with GitHub Actions Inactive

HYLcool marked this pull request as ready for review December 24, 2024 03:18

HYLcool changed the title ~~[WIP] format conversion tools for post tuning datasets~~ Format conversion tools for post tuning datasets Dec 24, 2024

HYLcool added the dj:tools issues/PRs about specific tools label Dec 24, 2024

Provide feedback