Releases: ELS-RD/kernl
Releases · ELS-RD/kernl
v0.2.2
v0.2.1
v0.2.0
New features
- Whisper support
- New debugger
- Improve warmup speed
- Improve CUDA graph memory allocation
What's Changed
- fix: python version by @pommedeterresautee in #124
- fix: change python version by @pommedeterresautee in #126
- feat: layernorm rms replacement for T5 by @gaetansnl in #107
- docs: update install steps by @Thytu in #128
- fix: pin Pytorch version to 1.12.1 by @pommedeterresautee in #130
- feat: update notebook T5 by @pommedeterresautee in #127
- fix: pinpoint pytorch in Docker image by @pommedeterresautee in #132
- feat: add ort benchmark script in xp folder by @pommedeterresautee in #139
- fix: remove fx profiler by @pommedeterresautee in #144
- feat: update to PyTorch 1.14 by @pommedeterresautee in #134
- fix: rename all close by @gaetansnl in #147
- feat: improve debugger api by @ayoub-louati in #116
- docs: add static documentation website by @white-gorilla in #112
- fix: removes the noindex meta. by @white-gorilla in #157
- feat: add stale workflow by @pommedeterresautee in #155
- fix: attention bug with large values by @gaetansnl in #158
- feat: only close issues marked as questions by @pommedeterresautee in #162
- feat: monthly triton update by @pommedeterresautee in #160
- feat: simplify tests by @pommedeterresautee in #159
- feat: check stability of output by @pommedeterresautee in #163
- docs: install instructions simplification by @pommedeterresautee in #150
- feat: better precision when the mask is big (covers lots of tokens) by @pommedeterresautee in #168
- feat: remove nvfuser by @pommedeterresautee in #175
- fix: bug where some specific shapes make inference on cuda graphs crashing by @pommedeterresautee in #177
- feat: add autotune to optimize attention speed by @pommedeterresautee in #172
- feat: implements highlighted features in the static site. by @white-gorilla in #171
- feat: add google analytics (ga4) and cookie consent. by @white-gorilla in #186
- fix: layernorm stride by @gaetansnl in #190
- fix: support stride for layernorm when dim is more than 3 by @gaetansnl in #192
- feat: add performance dynamic chart. by @white-gorilla in #191
- docs: add some code reference by @jonathlela in #199
- feat: attention for skinny Q tensor by @pommedeterresautee in #198
- fix: site deployment workflow miss mkdocstrings python handler by @jonathlela in #209
- feat: openai whisper support by @gaetansnl in #122
- fix: disabled search field on landing page. by @white-gorilla in #210
- feat: update triton by @pommedeterresautee in #211
- fix: missing dependency by @gaetansnl in #218
- feat: update torch and modify cuda graphs by @pommedeterresautee in #220
- fix: fix flake8 config by @jonathlela in #231
- feat: reduce cg memory footprint by @pommedeterresautee in #225
- fix: pin protobuf dependency to bert tutorial by @jonathlela in #221
- docs: contribution guide. by @white-gorilla in #222
- fix: stop regenerating graph module by @pommedeterresautee in #235
- fix: test crashing on cuda graph by @pommedeterresautee in #243
- fix: update README with GPU requirements by @jonathlela in #251
- feat: add navigation footer. by @white-gorilla in #250
- fix: attention refactor and base doc by @gaetansnl in #216
- fix: update pytorch nightly to fix memory leak by @pommedeterresautee in #255
- docs: how to support a new model tutorial by @jonathlela in #181
- feat: replace whisper script by notebook + test fix by @pommedeterresautee in #257
- fix: update dep + fix by @pommedeterresautee in #265
- feat: reduce memory overhead in CUDA Graph by @pommedeterresautee in #268
- feat: add links to tutorials on README.md by @pommedeterresautee in #269
- feat: new debugger by @gaetansnl in #196
- fix: remove old debugger doc by @jonathlela in #271
New Contributors
- @Thytu made their first contribution in #128
- @jonathlela made their first contribution in #199
Full Changelog: v0.1.0...v0.2.0
first release
- covers Bert (and derivatives), and partially T5
- 3 kernels: Flash Attention, LayerNorm, LinearLayer
- debug mode
- tutorials