Releases · ELS-RD/kernl

New features

Whisper support
New debugger
Improve warmup speed
Improve CUDA graph memory allocation

What's Changed

fix: python version by @pommedeterresautee in #124
fix: change python version by @pommedeterresautee in #126
feat: layernorm rms replacement for T5 by @gaetansnl in #107
docs: update install steps by @Thytu in #128
fix: pin Pytorch version to 1.12.1 by @pommedeterresautee in #130
feat: update notebook T5 by @pommedeterresautee in #127
fix: pinpoint pytorch in Docker image by @pommedeterresautee in #132
feat: add ort benchmark script in xp folder by @pommedeterresautee in #139
fix: remove fx profiler by @pommedeterresautee in #144
feat: update to PyTorch 1.14 by @pommedeterresautee in #134
fix: rename all close by @gaetansnl in #147
feat: improve debugger api by @ayoub-louati in #116
docs: add static documentation website by @white-gorilla in #112
fix: removes the noindex meta. by @white-gorilla in #157
feat: add stale workflow by @pommedeterresautee in #155
fix: attention bug with large values by @gaetansnl in #158
feat: only close issues marked as questions by @pommedeterresautee in #162
feat: monthly triton update by @pommedeterresautee in #160
feat: simplify tests by @pommedeterresautee in #159
feat: check stability of output by @pommedeterresautee in #163
docs: install instructions simplification by @pommedeterresautee in #150
feat: better precision when the mask is big (covers lots of tokens) by @pommedeterresautee in #168
feat: remove nvfuser by @pommedeterresautee in #175
fix: bug where some specific shapes make inference on cuda graphs crashing by @pommedeterresautee in #177
feat: add autotune to optimize attention speed by @pommedeterresautee in #172
feat: implements highlighted features in the static site. by @white-gorilla in #171
feat: add google analytics (ga4) and cookie consent. by @white-gorilla in #186
fix: layernorm stride by @gaetansnl in #190
fix: support stride for layernorm when dim is more than 3 by @gaetansnl in #192
feat: add performance dynamic chart. by @white-gorilla in #191
docs: add some code reference by @jonathlela in #199
feat: attention for skinny Q tensor by @pommedeterresautee in #198
fix: site deployment workflow miss mkdocstrings python handler by @jonathlela in #209
feat: openai whisper support by @gaetansnl in #122
fix: disabled search field on landing page. by @white-gorilla in #210
feat: update triton by @pommedeterresautee in #211
fix: missing dependency by @gaetansnl in #218
feat: update torch and modify cuda graphs by @pommedeterresautee in #220
fix: fix flake8 config by @jonathlela in #231
feat: reduce cg memory footprint by @pommedeterresautee in #225
fix: pin protobuf dependency to bert tutorial by @jonathlela in #221
docs: contribution guide. by @white-gorilla in #222
fix: stop regenerating graph module by @pommedeterresautee in #235
fix: test crashing on cuda graph by @pommedeterresautee in #243
fix: update README with GPU requirements by @jonathlela in #251
feat: add navigation footer. by @white-gorilla in #250
fix: attention refactor and base doc by @gaetansnl in #216
fix: update pytorch nightly to fix memory leak by @pommedeterresautee in #255
docs: how to support a new model tutorial by @jonathlela in #181
feat: replace whisper script by notebook + test fix by @pommedeterresautee in #257
fix: update dep + fix by @pommedeterresautee in #265
feat: reduce memory overhead in CUDA Graph by @pommedeterresautee in #268
feat: add links to tutorials on README.md by @pommedeterresautee in #269
feat: new debugger by @gaetansnl in #196
fix: remove old debugger doc by @jonathlela in #271