What's Changed
- move deprecated kwargs from trainer to trainingargs by @winglian in #2028
- add axolotlai docker hub org to publish list by @winglian in #2031
- update actions version for node16 deprecation by @winglian in #2037
- replace references to personal docker hub to org docker hub by @winglian in #2036
- feat: add metharme chat_template by @NanoCode012 in #2033
- change deprecated Stub to App by @winglian in #2038
- fix: handle sharegpt dataset missing by @NanoCode012 in #2035
- add P2P env when multi-gpu but not the full node by @winglian in #2041
- invert the string in string check for p2p device check by @winglian in #2044
- feat: print out dataset length even if not preprocess by @NanoCode012 in #2034
- Add example YAML file for training Mistral using DPO by @olivermolenschot in #2029
- fix: inference not using chat_template by @NanoCode012 in #2019
- feat: cancel ongoing tests if new CI is triggered by @NanoCode012 in #2046
- feat: upgrade to liger 0.4.1 by @NanoCode012 in #2045
- run pypi release action on tag create w version by @winglian in #2047
- make sure to tag images in docker for tagged releases by @winglian in #2051
- retry flaky test_packing_stream_dataset test that timesout on read by @winglian in #2052
- install default torch version if not already, new xformers wheels for torch 2.5.x by @winglian in #2049
- fix push to main and tag semver build for docker ci by @winglian in #2054
- Update unsloth for torch.cuda.amp deprecation by @bursteratom in #2042
- don't cancel the tests on main automatically for concurrency by @winglian in #2055
- ADOPT optimizer integration by @bursteratom in #2032
- Grokfast support by @winglian in #1917
- upgrade to flash-attn 2.7.0 by @winglian in #2048
- make sure to add tags for versioned tag on cloud docker images by @winglian in #2060
- fix duplicate base build by @winglian in #2061
- fix env var extraction by @winglian in #2043
- gradient accumulation tests, embeddings w pad_token fix, smaller models by @winglian in #2059
- upgrade datasets==3.1.0 and add upstream check by @winglian in #2067
- update to be deprecated evaluation_strategy by @winglian in #1682
- remove the bos token from dpo outputs by @winglian in #1733
- support passing trust_remote_code to dataset loading by @winglian in #2050
- support for schedule free and e2e ci smoke test by @winglian in #2066
- Fsdp grad accum monkeypatch by @winglian in #2064
- fix: loading locally downloaded dataset by @NanoCode012 in #2056
- Update
get_unpad_data
patching for multipack by @chiragjn in #2013 - increase worker count to 8 for basic pytests by @winglian in #2075
- upgrade autoawq==0.2.7.post2 for transformers fix by @winglian in #2070
- optim e2e tests to run a bit faster by @winglian in #2069
- don't build bdist by @winglian in #2076
- static assets, readme, and badges update v1 by @winglian in #2077
- Readme updates v2 by @winglian in #2078
- bump transformers for fsdp-grad-accum fix, remove patch by @winglian in #2079
- Feat: Drop long samples and shuffle rl samples by @NanoCode012 in #2040
- add optimizer step to prevent warning in tests by @winglian in #1502
- fix brackets on docker ci builds, add option to skip e2e builds by @winglian in #2080
- remove deprecated extra metadata kwarg from pydantic Field by @winglian in #2081
- release version 0.5.1 by @winglian in #2082
- make sure action has permission to create release by @winglian in #2083
- set manifest and fix for source dist by @winglian in #2084
- add missing dunder-init for monkeypatches and add tests for install from sdist by @winglian in #2085
New Contributors
- @olivermolenschot made their first contribution in #2029
Full Changelog: v0.5.0...v0.5.2